In Silico dszC Gene Analysis, Modeling and Validation of Dibenzothiophene monooxygenase (DszC Enzyme) of Dibenzothiophene Desulfurizing Streptomyces sp.VUR PPR 102

.

In the present scenario, most of the daily activities and needs of human beings are associated with utilization of fossil fuels like petroleum products and coal.The petroleum products are continuously oxidized by automobiles and industries releasing hazardous gases.Sulfur dioxide is one such hazardous gas which is deleterious to humans and environment 1 .When sulfur dioxide concentration in atmosphere reaches to above 20 ppm concentration, it causes cough, irritation of eyes, alteration in heart-beat rhythm, secretion of mucus, chronic bronchitis, provocation of asthma and damage of nerves associated with respiratory system 2 .Sulfur dioxide in the environment is converted to sulfur trioxide (a secondary air pollutant) which in turn reacts with moisture to form sulfurous acid (further becomes sulfuric acid).The resulted acid combines with rain water and reaches the soil as acid rain.The acid rains increase the acidity in water bodies and affects aquatic life.In addition, acid rains decrease the pH of the soil drastically affecting the vegetation which results in the low yield of crop plants 3 .
The fossil fuels contain organosulfur compounds in the form of thiophenes, thiols and sulfides.When fossil fuels are burnt for their utilization, the sulfur dioxide is released due to the oxidation of these organosulfur compounds.The major organosulfur group responsible for sulfur dioxide emission is thiophene.Among the thiophenes, dibenzothiophene (DBT) is the most recalcitrant and abundantly present orgnaosulfur compound in fossil fuels 4 and so considered as a model substrate for performing desulfurization studies 5 .Hydrodesulfurization (HDS), a contemporary method used for the removal of sulfur from petroleum products during refining process is not so efficient in fulfilling its task.Mainly, DBT and its alkyl derivatives are not affected and removed during HDS treatment.Biodesulfurization (BDS) process that exploits microbes to separate sulfur content from petroleum products was found and suggested as a better alternative approach to hydrodesulfurization.Moreover, BDS method is an economical and environmental friendly process.The microbes which can desulfurize DBT via 4S pathway are commercially important when compared with the microbes which metabolize DBT by other metabolic pathways which results in destruction of DBT ring structure and reduction of calorific value (combustion value) of the fuel (petroleum product).The microorganisms remove the sulfur atom via 4s metabolic pathway from DBT by breaking C-S (carbon-sulfur) bond without interrupting the cyclic ring structure of DBT and so, the mileage (calorific value) of fuel is not changed 6 .The four reaction steps of 4S pathway are catalyzed by DszA, B and C enzymes.The dszA, B and C genes (dsz operon) encode DszA, DszB and DszC enzymatic proteins, respectively.The enzyme, DszC (DBT monooxygenase) catalyze the first two consecutive reaction steps viz., conversion of DBT to DBTO (DBT oxide) and then DBTO to DBTO 2 (DBT sulfone).In the third step, DBT sulfone is converted to HPBS (hydroxyl phenyl benzene sulfonate) by the activity of DszA enzyme (DBTO 2 monooxygenase).The DszB enzyme (HPBS desulfinase) catalyze the conversion of HPBS to 2-HBP and sulfite, the final step of 4Spathway 7 .
Bioinformatics is a branch of science that deals with the use of software tools and programs in managing and interpretation of biological data.The bioinformatics tools can be employed for the In silico analysis of genes (in NCBI open reading frame finder), designing and modeling of biomolecules and evaluation of predicted models of biomolecules.Presently, protein modeling is an emerging area in the field of bioinformatics where in, various tools and programs are available for the prediction (SWISS MODEL server, Phyre2, NCBI blast etc.,) and validation (Rampage server, PROCHECK server, ERRAT, ProSA etc.,) of threedimensional structures ofproteins 8,9,10 .
The present work is designed to derive the protein sequence the dszC gene of Streptomyces sp.VUR PPR 102, isolated from oil contaminated soils 11 , in open reading frame (ORF) finder of NCBI to obtain the sequence of DszC enzymatic protein and to predict its three-dimensional model by using SWISS MODEL server.Further, the threedimensional model of DszC enzyme was checked for its validity in Rampage, SPBDV, Verify3D and ERRAT servers.

In Silico derivation of amino acid sequence of dszC enzyme from dszC gene
The dszC gene sequence of DBT desulfurizing Streptomyces sp.VUR PPR 102 was entered in open reading frame finder of NCBI in FASTA format.In NCBI ORF finder, the ORF tool generates an open reading frame (ORF) corresponding to the submitted gene sequence.The ORF is equivalent to a matured messenger RNA.The ORF contains the genetic code information for the sequence of amino acids of a protein to be synthesized.The ORF which is exhibiting maximum length is selected to generate corresponding sequence of a protein 12,13 .

Prediction of dszC enzyme model
The DszC enzyme sequence obtained was entered in SWISS MODEL server in FASTA format to built 3-D model of DszC enzyme protein.The DszC model was developed by automated mode.In SWISS MODEL server the submitted protein sequence protein (query protein sequence) is searched against various sequences of proteins in STML (template library).The protein of template library showing high degree of similarity to query protein sequence is regarded as template protein based on which the three-dimensional structure of query protein was predicted.The homology modeling of a protein in SWISS MODEL is accomplished by PROMOD3 modeling platform.The PROMOD3 employs HH search and BLAST for tracing the template sequences against the given protein sequence 14,15 .

evaluation of dszC enzyme model quality
The DszC enzyme protein model was checked in Rampage, SPDBV, Verify3D and ERRAT.For checking the DszC enzyme protein model quality, PDB format of enzyme model was used.In Rampage server, Ramachandran plot was generated for the submitted protein model.The Ramachandran plot is produced based on Psi and Phi (torsion) angles of amino acid residues in the given protein model.Relying on these angles in the Ramachandran plot, amino acid distribution in favored, allowed and outlier regions is displayed.Depending upon the percentage of amino acids displayed in favored region, the quality of protein model was determined 16,17 .The PDB formats of DszC enzymatic protein and its template models were imported to SPDBV.Then, DszC enzyme model was superimposed on its template model to calculate root mean square deviation (RMSD) value.Based on the RMSD value the quality of the protein model was verified 18 .In verify3D, the consonance of protein model structure (3-D) with its amino acid sequence (1-D i.e., primary structure) is determined.The verify3D program predicts the better model quality of a protein structure 19 .The ERRAT program generates a plot which depicts the data of structural error of each amino acid in three-dimensional protein model.Then the overall quality factor is determined based on which model quality of protein is predicted 20 .

results and disCussion dszC enzyme protein sequence
In ORF finder of NCBI, for the submitted dszC gene sequence (Figure 1), six reading frames (ORFs) were obtained in the form of graphical bars (Figure 2).The actual reading frame length in each bar was indicated as shaded region.The reading frame starts with an initiation codon and terminates with a stop or nonsense codon.The reading frame with highest length was selected and translated to obtain the sequence of DszC enzyme 21 .The following was the amino acid sequence of DszC enzyme in FASTA format.LLVPREYGGWGADWLTAIEVVREIAAA DGSLGHLFGYHLTNAPMIELIGSQEQEE HLYTQIAQNNWWTGTSSENNSHVLDWKV SATPTEDGGYVLNGTKHFCSGAKGSDLLFV LG VVQDDSPQQGAIIAAAIPHRGLAL

R P T T T G P P S A C G R P TAV P RT S T T S R S S LT K C WA R P T P S F S P S Y N P S A A A S S R P RNSSPTSIWGSRTAHSMPPGSTPDPGEA LDTGRYSTQPRIL homology modeling of dszC enzyme
To generate a model of protein in SWISS MODEL, its template library should contain at least one experimentally validated protein model which has a close resemblance to the submitted protein 22 .The protein sequence present in template library which was maximum similar to DszC enzyme sequence was treated as template and the template determined in SWISS MODEL was4doy.1.A.The alignment of sequences of template (4doy.1.A) and DszC enzyme was shown in Figure3.The three-dimensional structure of DszC enzyme (homodimer) (Figure4) was developed using the template, 4doy.1.A model (Figure5).DszC enzyme model in favored, allowed and outlier regions (Table 1) confirm the validity of the model.In the Ramachandran plot (Figure 6) of DszC enzyme model 92.0% of amino acids were Plot of a protein model with good stereo chemical quality, above 90% of amino acid residues will be in favored region 24,25 .In SPDBV, root-meansquare deviation (RMSD) value was obtained as a result of superimposition of DszC enzyme model over its template (Figure 7).The RMSD value infers the degree of similarity between two 3-D structures.The RMSD value generated between models of DszC enzymatic protein and its template was very low (0.26A o ).The generated low RMSD value inferred a close similarity between the models, DszC enzymatic protein and its template which indicated the good quality of DszC enzyme model 26,27 .The verify3D program revealed that 100% of amino acid residues of DszC enzyme model had shown a 3D/1D profile score e" 0.2 (Figure 8) which indicated the best model quality of DszC enzyme.To confirm the validity of a protein model in Verify3D program, a minimum of 80% of its amino acids 3D/1D profile score must be The overall quality factor generated in ERRAT for DszC enzyme model was 91.765 (Figure 9) which indicated its genuine model quality 20 .

Fig. 9 .
Fig. 9. Overall quality factor of DszC enzyme model generated in ERRAT program

table 1 .
The abundance of amino acids of DszC enzyme model in different regions of Ramachandran plot of Streptomyces sp.VUR PPR 102