Residue Property Plot¶
Understanding the function–dynamics relationship of protein often re-quires comparing two or more properties, e.g., the effect of dihedral fluctuation on the RMSF of the protein. The two properties, dihedral fluctuation and RMSF, can be easily plotted with any plotting program. Later, one might be interested in knowing the trend between the RMSF and solvent exposure of the residues. Traditionally, each time new prop-erties are compared, new plots need to be prepared. This repetitive pro-cess is alleviated by making overlaid plots of all the residue properties at once (Figure 1A). MD DaVis can plot the following quantities: RMSF, torsional flexibility, secondary structure, solvent accessible surface area, and surface electrostatic potential on each residue. Showing all the prop-erties as an overlaid plot can be overwhelming and cluttering. However, the option to interactively turn on or off the visualization for each data series clears the clutter and highlights the similarities and differences. Moreover, the labels and annotations that appear on hovering the cursor over certain regions improve the interpretability of these plots, thereby granting substantial time savings. The data obtained from multiple trajectories can be overlaid in a single interactive plot for ease of comparison, which removes the need for making multiple plots and immediately brings out the distinguishing features between the trajectories. The novel feature of MD DaVis is that it can align the data for similar proteins by inserting required gaps along the x-axis using an alignment file. Aligning the residues on the x-axis aligns the peaks, highlighting the similarities between the datasets, which is incredibly powerful when comparing the dynamical information from similar proteins.
- Create interactive plot of containing the following residue level properties:
root mean squared fluctuation (RMSF)
torsional flexibility (circular standard deviation of backbone dihedral angles)
Secondary structure
Solvent accessible surface area
Mean and standard deviation for the total surface electrostatic potential per residue
Mean and standard deviation for the mean surface electrostatic potential per residue
It can also use an alignment to align the residue level data from different proteins along the x-axis. This ensures that the peaks line up properly for better interpretation.
Traditionally, each time the analysis of MD trajectories are compared new statis plots have to be created.
Conventional analysis requires plotting the data each time new properties are compared. This repetitive process is alleviated by making overlaid plots of all the informative residue propertie
Note
the paths in the input toml file is relative to the location where the md_davis command will be called from. To avoid any confusion try using absolute paths.
How to interact with the plot¶
## Step 3: Plotting overlaid residue data Step 3a: Create a pickle file with the residue dataframe using:
md_davis residue dataframe --prefix name1 output1.h5 data1.p
The optional argument -a annotations.json can be provided to place a mark at certain residue locations. The contents of annotations.json should be of the following form:
{
"chain 0": {"Active Site": [23, 41], "Substrate Binding Site": [56]},
"chain 1": {"Nucleotide Binding Regions": [15, 18]}
}
Each type of annotation is rendered with a different mark. Following annotations are available at present: * Active Site * Nucleotide Binding Regions * NADP Binding Site * Substrate Binding Site * Metal Binding Site * Cofactor Binding Site * Mutation
Step 3b: If your proteins are of different lengths and you need the peaks to be aligned, create a JSON file as shown below.
{
"alignment": "path/to/alignment_file.clustal_num",
"locations": {
"name1": "name1_residue_wise_data.p",
"name2": "name2_residue_wise_data.p",
"name3": "name3_residue_wise_data.p"
},
"output": "acylphosphatase_residue_wise_data_aligned.p"
}
The contents of the alignment file, alignment_file.clustal_num must be in CLUSTAL format; for example:
CLUSTAL O(1.2.4) multiple sequence alignment
name1 --STARPLKSVDYEVFGRVQGVCFRMYAEDEARKIGVVGWVKNTSKGTVTGQVQGPEEKV 58
name2 --------PRLVALVKGRVQGVGYRAFAQKKALELGLSGYAENLPDGRVEVVAEGPKEAL 52
name3 ---VAKQIFALDFEIFGRVQGVFFRKHTSHEAKRLGVRGWCMNTRDGTVKGQLEAPMMNL 57
: *:**** :* . :. . : *: * * * . :
name1 NSMKSWLSKVGSPSSRIDRTNFSNEKTISKLEYSNFSVRY 98
name2 ELFLHHLKQ--GPRLARVEAVEVQWGEE--AGLKGFHVY- 87
name3 MEMKHWLENNRIPNAKVSKAEFSQIQEIEDYTFTSFDIKH 97
: : * : * :
Step 3b: Plot the residue data pickle file from the previous command using:
md_davis plot residue data1.p data2.p
Annotations¶
{
"chain 0": {"Active Site": [23, 41], "Substrate Binding Site": [56]},
"chain 1": {"Nucleotide Binding Regions": [15, 18]}
}
Each type of annotation is rendered with a different mark. Following annotations are available at present: * Active Site * Nucleotide Binding Regions * NADP Binding Site * Substrate Binding Site * Metal Binding Site * Cofactor Binding Site * Mutation