注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

小楼一夜听风雨

计算机知识、分子模拟、Linux学习,与您共分享!

 
 
 

日志

 
 

Easy Modeller Introduction  

2010-04-21 22:03:04|  分类: 分子模拟 |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |

引言:Modeller是一个蛋白质结构预测的软件:1)支持安装在windows、Mac和Linux平台上; 2)支持基于多个模版模建结构; 3)它自身带一套模建结构后的优化、分析软件。关键是它是完全免费的,而且得到广泛的认可,目前最新版本为Modeller 9v7。但是该软件完全是命令行模式,操作相对复杂,对于习惯于图形界面(GUI)的我们来说不太方便。印度 Hyderabad大学的一位牛人Kuntal Kumar Bhusan为其编写了一个GUI界面,即为Easy Modeller,使这一切变得极为简单,下面引用该软件的1.0版对其进行介绍。(目前该软件的最新版本为2.0,可以支持windows系统下各版本的Modeller)

Easy Modeller v1.0  is A GUI to MODELLER

Developed by: Kuntal Kumar Bhusan

Contact: kuntal.bhusan@gmail.com

Prof. Reddanna Eicosanoids, Inflammation and Cancer Research Group

Department of Animal Sciences, School of Life Sciences, University of Hyderabad

1. Introduction

One of the biggest goals in structural bioinformatics is the prediction of the three-dimensional structure of a protein from its one-dimensional protein sequence. The goal is to be able to determine the shape (known as a fold) that a given amino acid sequence will adopt. The problem is divided further based on whether the sequence will adopt a new fold or resemble an existing fold (template) in a protein structure database. Fold recognition is easy when the sequence in question has a high degree of sequence similarity to a sequence with known structure [7]. If the two sequences share evolutionary ancestry, they are said to be homologous. For such sequence pairs we can build a structure for the query protein by choosing the structure of the known homologous sequence as a template. This is known as comparative modeling. When the query lacks a good template structure, one must attempt to build a protein tertiary structure from scratch. These methods are usually called ab initio methods. In a third fold-prediction scenario, there may not necessarily be good sequence similarity with a known structure, but a structural template may still exist for the given sequence. To clarify this case, a person aware of the target structure could extract the template using structure?structure alignments of the target against the entire structural database. It is important to note that the target and template need not be homologous. These two cases define the fold prediction (homologous) and fold prediction (analogous) problems during CASP competition. Comparative Modeling or homology modeling is used when there exists a clear relationship between the sequence of a query protein (unknown structure) to that of a sequence of a known structure. The most basic approach to structure prediction for such (query) proteins is to perform a pairwise sequence alignment against each sequence in protein sequence databases. This can be accomplished using sequence alignment algorithms such as Smith?Waterman [55] or sequence search algorithms (e.g., BLAST [3]). With a good sequence alignment in hand, the challenge in comparative modeling becomes how best to build a three-dimensional protein structure for a query protein using the template structure. The heart of the process is the selection of a suitable structural template based on sequence pair similarity. This is followed by the alignment of query sequence to the template structure selected to build the backbone of the query protein. Finally the entire structure modeled is refined by loop construction and side-chain modeling. Several comparative modeling methods, more commonly known as modeler programs, focusing on various parts of the problem have been developed over the past several years [6, 13].

MODELLER is a computer program that models three-dimensional structures of proteins and their assemblies by satisfaction of spatial restraints.

More generally, the inputs to the program are restraints on the spatial structure of the amino acid sequence(s) and ligands to be modeled. The output is a 3D structure that satisfies these restraints as well as possible. Restraints can in principle be derived from a number of different sources. These include related protein structures (comparative modeling), NMR experiments (NMR refinement), rules of secondary structure packing (combinatorial modeling), cross-linking experiments, fluorescence spectroscopy, image reconstruction in electron microscopy, site-directed mutagenesis, intuition, residue-residue and atom-atom potentials of mean force, etc. The restraints can operate on distances, angles, dihedral angles, pairs of dihedral angles and some other spatial features defined by atoms or pseudo atoms. Presently, MODELLER automatically derives the restraints only from the known related structures and their alignment with the target sequence.

A 3D model is obtained by optimization of a molecular probability density function (pdf). The molecular pdf for comparative modeling is optimized with the variable target function procedure in Cartesian space that employs methods of conjugate gradients and molecular dynamics with simulated annealing.

MODELLER can also perform multiple comparisons of protein sequences and/or structures, clustering of proteins, and searching of sequence databases. The program is used with a scripting language and does not include any graphics. It is written in standard FORTRAN 90 and will run on UNIX, Windows, or Mac computers.

MODELLER implements an automated approach to comparative protein structure modeling by satisfaction of spatial restraints [6].Briefly, the core modeling procedure begins with an alignment of the sequence to be modeled (target) with related known 3D structures (templates). This alignment is usually the input to the program. The output is a 3D model for the target sequence containing all mainchain and sidechain non-hydrogen atoms. Given an alignment, the model is obtained without any user intervention. First, many distance and dihedral angle restraints on the target sequence are calculated from its alignment with template 3D structures .The form of these restraints was obtained from a statistical analysis of the relationships between many pairs of homologous structures. This analysis relied on a database of 105 family alignments that included 416 proteins with known 3D structure [7]. By scanning the database, tables quantifying various correlations were obtained, such as the correlations between two equivalent Cα-Cα distances, or between equivalent mainchain dihedral angles from two related proteins. These relationships were expressed as conditional probability density functions (pdf's) and can be used directly as spatial restraints. For example, probabilities for different values of the mainchain dihedral angles are calculated from the type of a residue considered, from mainchain conformation of an equivalent residue, and from sequence similarity between the two proteins. Another example is the pdf for a certain Cα-Cα distance given equivalent distances in two related protein structures. An important feature of the method is that the spatial restraints are obtained empirically, from a database of protein structure alignments. Next, the spatial restraints and CHARMM energy terms enforcing proper stereochemistry [8] are combined into an objective function. Finally, the model is obtained by optimizing the objective function in Cartesian space. The optimization is carried out by the use of the variable target function method [9] employing methods of conjugate gradients and molecular dynamics with simulated annealing. Several slightly different models can be calculated by varying the initial structure. The variability among these models can be used to estimate the errors in the corresponding regions of the fold. There are additional specialized modeling protocols, such as that for the modeling of loops.

Homology modelling is presently the only accurate (as far available) and fast method for getting the protein 3D structure from its sequence. Other methods like ab initio modelling are resource intensive, computationally costly and are very difficult to implement. Many types of software are available for homology modelling of which the most famous and commonest tool for homology modelling is MODELLER. Apart from this there are also commercially available softwares like Insight II, Discovery studio, Hyperchem, etc. But MODELLER stands apart from this because it is freely available. But MODELLER has no GUI and most users find it a bit difficult to use MODELLER as it is controlled by Python script files (Fig. 1). A user needs to know basic Python scripting to use MODELLER, so a GUI for this great package would be very helpful to enable all users to use it easily. In this work a GUI for MODELLER has been developed, which is a standalone executable that runs on windows platform. The GUI has been developed using PerlTk and needs to have MODELLER and Python preinstalled in the users system. Users do not require knowing any scripting, the GUI guides through the entire process of homology modelling.

Fig 1: A sample Python script to run MODELLER

 

The tool Easy Modeller is developed solely for the purpose of GUI based assisted homology modelling using MODELLER. So, it is just a front end application with the main program, i.e., MODELLER running in backend. The programming language used for building Easy Modeller is PerlTk. The tool accepts user inputs via the Graphical User Interface and controls the appropriate MODELLER modules internally and displays the output or any error in a display text area. The user can as well view the standard MODELLER verbose output running in another window which opens simultaneously with the application.

Easy Modeller has a main window (Fig.2) which allows the user to choose the appropriate modelling option. Four most commonly used options for homology modelling are available in Easy Modeller namely:

  1. Modelling using single Template.
  2. Modelling using single Template including heteroatom.
  3. Modelling using multiple templates
  4. Loop modelling.

Fig 2: Main window of Easy Modeller

 

5.1 Modelling using single Template:

Upon selecting this option a new window appears (Fig.3) which has two input options. A fully explained guide text is displayed in the text area which gives a detailed explanation for performing a single template based homology modelling. To enter the sequence a user must delete the display help text and paste the sequence in the text area and then select the "Load Template" option. One important thing that the user must keep in mind is that, the sequence entered must be only the amino acid sequence and nothing else (like accession ID, organism name, etc). It is mandatory to enter the query sequence first and then load the template; otherwise an error message is displayed in the text box asking the user to input the sequence first. One other important option is selection of the chain ID, i.e. which chain of the input template PDB to use for modelling. By default the chain is always set to "A" which can be altered by using the dropdown list or can be manually entered as well. If the input PDB has no chain ID then the chain ID should be deleted and only one blank space should be entered. After this the template PDB is to be loaded by selecting the "Load Template" feature which opens a standard explorer window from where the user can browse the required template PDB file.

After this a simple two step process is required to be followed as indicated clearly by the two buttons STEP1 (GET ALIGNMENT) and STEP2 (GET MODEL). By selecting GET ALIGNMENT the program calls the appropriate MODELLER module for the required alignment and displays the output alignment in the text area. Looking at this the user can confirm on his selection of template model and proceed for the next step to get the structure model for the query sequence. Upon selecting the GET MODEL feature the tool asks the user to input the number of models to generate. Any number of models can be generated (although 5 is taken as the standard). After entering this value and pressing OK button, the application calls the appropriate MODELLER modules and performs the modelling in backend. The entire process at backend can be seen for advanced manipulation which is displayed in the verbose output. On completion of the modelling process the MODELLER energy function (molpdf) and the DOPE score of the models are shown in a tabulated form in the verbose screen. In general the best model is with the lowest molpdf value and highest DOPE score but this is not always the final conclusion. To make a judgment on the model quality the models should be evaluated first. Fortunately MODELLER itself provides a feature for assessing the model quality by making an energy profile. The tool facilitates an easy way to do so using the GUI by selecting the EVALUATE MODEL feature. This button when pressed asks the user to input the model name desired to be evaluated which after entering can be subjected for profile generation by pressing the GET PROFILE button. When the profile of the selected model is generated a message is displayed in the text area asking the user to select the PLOT PROFILE feature which on selection automatically shows a graphical plot of the energy profile of the selected model. The plotting feature internally uses the Microsoft Excel plotting function to generate the plot so it is required to have MS Excel preinstalled in the system to get the plot.

Fig 3: Single template modelling window

5.2 Model using single Template including Heteroatom:

It is often a tedious task to incorporate a heteroatom like a metal atom or a ligand successfully into a model from a template. If the template contains a ligand (or other HETATM residue) then MODELLER can transfer this into the generated model. This is done first by setting env.io.hetatm to True, which instructs MODELLER to read HETATM records from the template PDB files, and then by using the BLK ('.') residue type in the alignment (both in the template and the model sequence) to copy the ligand(s) as a rigid body into the model.

Easy Modeller uses a very simple and easy interface to implement this feature of MODELLER through its GUI. The interface and the working methodology is exactly the same as single template modelling.

5.3 Modelling using multiple Templates:

 

An important aim of modeling is to contribute to understanding of the function of the modeled protein. Sometimes after a single template modelling, inspection of the template structure reveals that some loops are disordered and does not appear in the PDB structure. It becomes an important issue when these loops are one of the functionally most important parts of the enzyme. The unreliability of the template coordinates and the inability of MODELLER to model long insertions is why these loop are poorly modeled which are indicated by the energy profile seen after evaluating the models build by single template modelling. When we are interested in understanding differences in specificity between two similar proteins, we need to build precise and accurate models. Therefore, we need to find new strategies to increase the accuracy of the models. Use of multiple templates is one of such approach to achieve the above said objective.

Multiple template modelling is thus another feature of Easy Modeller. The basic working pattern of this multiple template modelling window is same except the fact that here after entering the sequence, it is required to specify first the number of models the user plans to use for modelling (which is limited to maximum 5).

Upon selecting the LOAD TEMPLATES feature a new window appears (Fig. 4) which allows the user to input the selected number of templates one by one. It should be kept in mind that the loading of templates should be done in order, i.e. one, two, three and so on (and not one, three, four, etc). The rules for selecting the CHAIN ID are same as mentioned before (i.e. when the template PDB has no chain it is required to put a blank space in the entry box by deleting the default ?A?). After this the ALIGN TEMPLATES button when clicked performs an alignment of all the input templates and displays them in the text area. Following this the rest of the process of modelling is same as that for single template modelling, i.e. performing the two step process of GET ALIGNMENT and GET MODEL. The EVALUATE MODEL feature is also available as in the previous case.

 

Fig.4 Multiple template modelling window

5.4 Loop refinement and Model building:

MODELLER has several loop optimization methods, which all rely on scoring functions and optimization protocols adapted for loop modeling [Fiser et al., 2000]. They are used to refine loop regions, either automatically after standard model building, or manually on an existing PDB file. Easy Modeller can be used for both. In many cases, a better quality loops can be obtained (at the expense of more computer time) by using the newer DOPE-based loop modeling protocol. This can be done by automatic loop modelling. On the other hand the manual loop modelling feature can be used to refine the conformation of the loop between a specified the starting and ending residue.

Automated loop modelling: This feature is always used after standard models are generated either by single or multi-template based modelling. Immediately after the GET MODEL feature is used and standard modelling is done, the user is asked whether he needs to perform an automated loop modelling. If YES is selected then the automated loop modelling is carried out. Selecting NO disables loop modelling and only the standard models are generated.

Fig.5 Automated loop modelling

Manual loop modelling: This feature is invoked upon selecting the Perform Loop Modelling feature in the main window. A new window is displayed that can be used to load a previously generated model and then manually enter the starting and ending residue number of the loop which has to be modeled. The number of loop models to be generated can also be specified.

Fig.6 Manual loop modelling

6. An example to demonstrate the application

To demonstrate the working methodology of the application Easy Modeller parts of the same example used in the MODELLER tutorial page has been used here for better comparison and understanding. The gene for lactate dehydrogenase chosen from the genomic sequence of Trichomonas vaginalis (TvLDH) was used as query sequence. The corresponding protein had a higher similarity to the malate dehydrogenase of the same species (TvMDH) than to any other LDH. Comparative models were constructed for TvLDH to study the sequences in the structural context. The individual modeling steps using Easy Modeller are explained below:

6.1. Template Search:

The template search was performed using a standard procedure of performing a PDB BLAST and then identifying the most homologous sequences based on sequence identity and crystallographic resolution of the template. It was found that 1bdm:A (i.e., chain A) was the best template for performing a single template based modelling. Other closely related hits were 2mdh:A and 1b8p:A.

6.2. Single template Modelling:

Single template modelling was performed using the A chain of PDB 1bdm. The steps are described below:

  1. The query template sequence was pasted in the text area and the template PDB was loaded using the LOAD TEMPLATE feature. The chain ID was selected as A.(Fig. 7)
  2. The GET ALIGNMENT feature was used to get the alignment of the query sequence with the template PDB.(Fig. 8)
  3. The GET MODEL feature was selected to generate the output models of the query. The number of models to generate was selected as 4. The score of each model generated was displayed in a tabulated? form in the verbose output screen.(Fig. 9)
  4. Finally the EVALUATE MODEL feature was used to generate the energy profile of the first model (query.B99990001.pdb) as it was found out to be the most reasonable model based on the overall scores. The energy profile plot for the above model was displayed using the PLOT PROFILE feature. (Fig. 10 and 11)

Fig. 7: Query sequence pasted and template PDB was loaded

 

Fig. 8: Query sequence aligned with template PDB

Fig. 9: Score table of the generated models

 

 

Fig. 10: Profile generation input window

Fig. 11: Energy Profile of query.B99990001.pdb

The plotted DOPE score profile (Fig. 12) shows regions of relatively high energy for the long active site loop between residues 90 and 100 and the long helices at the C-terminal end of the target sequence. (The model profile was superposed on the template profile - gaps in the plot can be seen corresponding to the gaps in the alignment (Fig. 13). It should be remembered that the scores are not absolute, so we cannot make a direct numerical comparison between the two. However, we can get an idea of the quality of our input alignment this way by comparing the rough shapes of the two profiles - if one is obviously shifted relative to the other, it is likely that the alignment is also shifted from the correct one.)

Fig. 12: Model profile superimposed on template profile

 

Fig. 13: query template alignment

6.3. Implementing Multi Template Modelling:

Inspection of the structure built by single template revealed that loop?? 93-100, one of the functionally most important parts of the enzyme, is disordered and does not appear in the PDB structure. The unreliability of the template coordinates and the inability of MODELLER to model long insertions is why this loop was poorly modeled in query, as indicated by the DOPE profile. Therefore, we need to find new strategies to increase the accuracy of the models. Various methods like multiple template based modelling or loop modelling can be used to solve this problem. If appropriate template information is not available and the loop is small then loop modelling can be used, but since here we were able to get other templates as well so multi template modelling was implemented as a new strategy to increase the accuracy of the model.

The following PDB files 1bdm:A, 2mdh:A and 1b8p:A were used as a template for the muti templatemodelling. The steps are described below:

  1. The query template sequence was pasted in the text area, the number of templates was set to 3 and the template PDBs were loaded one by one using the LOAD TEMPLATES feature. The chain IDs were selected as A for all.(Fig. 14)
  2. The ALIGN TEMPLATES feature was used to get the alignment of the template PDBs.(Fig. 15)
  3. The GET ALIGNMENT feature was used to get the alignment of the query sequence with the template PDBs.(Fig. 16)
  4. The GET MODEL feature was selected to generate the output models of the query. The number of models to generate was selected as 4. The score of each model generated was displayed in a tabulatedform in the verbose output screen.(Fig. 17)
  5. Finally the EVALUATE MODEL feature was used to generate the energy profile of the first model (query.B99990001.pdb) as it was found out to be the most reasonable model based on the overall scores. The energy profile plot for the above model was displayed using the PLOT PROFILE feature. (Fig. 18)

Fig. 14: Query sequence pasted and template PDBs loaded

 

Fig. 15: Template sequences aligned

Fig. 16: Template sequences aligned with query

Fig. 17: Score table of the generated models

Fig. 18: Energy Profile of query.B99990001.pdb

 

The evaluation of the model indicates that the problematic loop (residues 90 to 100) has improved by using multiple structural templates. The global DOPE score for the models also improved from -37513.8 to -38133.5. MODELLER was able to use the variability in the loop region from the three templates to generate a more accurate conformation of the loop. (Fig. 19)

Fig. 19: Multi template model profile superimposed on single template model profile

Fig. 20: 3D structure of the final generated model

 

The application generates some standard output files which are same as MODELLER. The output models are saved in the same directory where the application is kept with name as query.B99990001.pdb and automated loop modeled structures as query.BL00010001.pdb (where the last 1 is variable and is variable up to the number of models selected to be generated). Besides this the profile plot data are saved as ".csv" files which are also saved in the same directory. The generated script files are saved as well for reference and advanced manipulation. To start with a new sequence or a new process it is recommended to backup and delete the previously generated files as it will get overwritten and if kept in the same location might hamper in successful operation of the application (Fig. 21).

Fig. 21: Files generated by Easy Modeler

 

 

The application developed in this work can be used for easy homology modelling without knowing much about the backend processes and can proceed without any knowledge of scripting. The user does not have to worry about the input sequence formats and the alignment format that has to be supplied which is otherwise a very big problem while running MODELLER. Just pasting the sequence in the text window is a prerequisite; the rest of the process is taken care of by the application. Every step is automated, interactively guided and gives complete information of the backend process as well. The basic error handling is taken care by the display text area; advanced error handling can be done from the verbose output. The models can be easily evaluated and their energy can be viewed by automated plotting feature. Thus the application provides a one place solution to all the homology modelling needs.

 

1. P. Bourne and H. Weissig. Structural Bioinformatics. Wiley, Hoboken, NJ, 2003.

 

2. T. F. Smith and M. S. Waterman. Identification of common molecular subsequences.J. Mol. Biol., 147:195-197, 1981.

 

3. S. Altschul, W. Gish, W. Miller, E. Myers, and D. Lipman. Basic local alignment search tool. J. Mol. Biol., 215:403-410, 1990.

 

4. P. A. Bates and M. J. E. Sternberg. Model building by comparison at casp3: Using expert knowledge and computer automation. Proteins: Struct. Funct. Genet., 3:47-54, 1999.

 

5. A. Fiser, R. K. Do, and A. Sali. Modeling of loops in protein structures. Protein Sci., 9:1753-1773, 2000.

 

6. ali, A. & Blundell, T. L. (1993). J. Mol. Biol. 234, 779-815.

 

7. ali, A. & Overington, J. (1994). Protein Sci. 3, 1582-1596.

 

8. MacKerell, Jr., A. D., Bashford, D., Bellott, M., Dunbrack Jr., R. L., Evanseck, J. D., Field, M. J., Fischer, S., Gao, J., Guo, H., Ha, S., Joseph-McCarthy, D., Kuchnir, L., Kuczera, K., Lau, F. T. K., Mattos, C., Michnick, S., Ngo, T., Nguyen, D. T., Prodhom, B., Reiher, III, W. E., Roux, B., Schlenkrich, M., Smith, J. C., Stote, R., Straub, J., Watanabe, M., Wiorkiewicz-Kuczera, J., Yin, D., & Karplus, M. (1998). J. Phys. Chem. B, 102, 3586-3616.

 

9. Braun, W. & G, N. (1985). J. Mol. Biol. 186, 611-626.

 

System requirements:

The hardware requirements are same as that required for running MODELLER.

You should have MODELLER and PYTHON preinstalled in your system to run the application.

To display the energy profile plots Microsoft Excel should be installed.

*       To Download Easy Modeller v1.0

Right click and Save File

--------Easy Modeller v2.0

Right click and Save File

*       To visit the Easy Modeller blog and discussion forum

click here

MODELLER (copyright 1989-2008 Andrej Sali) is maintained by Ben Webb at the Departments of Biopharmaceutical Sciences and Pharmaceutical Chemistry, and California Institute for Quantitative Biomedical Research, Mission Bay Byers Hall, University of California San Francisco, San Francisco, CA 94158-2330, USA.

*       To visit MODELLER download and installation page

click here

*       To download PYTHON

click here

 

Thank you !!

 

 

  评论这张
 
阅读(2757)| 评论(3)
推荐 转载

历史上的今天

在LOFTER的更多文章

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2017