Construction of 3D Model of Protein Drug Targets for Renibacterium Salmoninarum-A Bacterial Pathogen Causing Bacterial Kidney Disease in Young Salmonid Fish

The aim of this study is to construct 3D models of potential drug targets for the Bacterial Kidney Disease (BKD) causing pathogen Renibacterium salmoninarum. The bacterial pathogen Renibacterium salmoninarum was selected for homology modeling studies since there were no known protein structures of the organism reported in the NCBI database. The reported protein sequences were run through DrugBank to pick out drug-targets. Online databases and web tools such as PMDB, UniProt, Drug Bank, and SwissModel were employed in this analysis. An aggregate of 412 protein sequences were identified as potential drug targets and were retrieved from the UniProt. Homology models of the protein sequences were constructed using the SwissModel database for all 412 proteins. These were then refined through a protein blast and Ramachandran plot analysis. Out of the 412 constructed models, 143 models were of reliable quality. These were then submitted to the PMDB database for further reference. To demonstrate the application of these constructed models, protein-ligand docking analysis using Auto Dock Vina was performed. Among the antibiotics that were tested against their known drug targets, trimethoprim demonstrated significant potential for the inhibition of R. salmoninarum’s dihydrofolate reductase protein, with a binding energy of -9.06 Kcal/mol and with the formation of 3 hydrogen bonds. Therefore through protein-ligand docking studies and the construction of 3D models of protein drug targets, Trimethoprim is proposed as a solution to the Bacterial Kidney Disease (BKD) problem in salmonid fishes. Further in-vitro evidences are in demand to prove this hypothesis.

Salmon is the common name used to refer to the various species of the Salmonidae family. There are nine commercially important species of salmon and these are widespread across the world. Most are found in the Pacific and Atlantic oceans, contributing significantly to the economic and cultural value of the nations situated near these oceans. For example, Rawas, also known as the Indian salmon, the only species that can be found in India, (and primarily spotted in the waters of the Indian states of Gujarat and Maharashtra) is a huge contributor to the income generated here. In 2010, the harvesting, processing, and retailing of the Bristol Bay Salmon (Oncorhynchus kisutch) created $1.5 billion in sales value across the United States. 1 Norway is thought to be the largest contributor to the Atlantic Salmon Industry. The country accounted for a total of 63% (2.3 million tonnes) of the total production of Salmon in 2016.
Farmed salmon is now a global commodity which has seen an exponential rise in various markets. 2 Salmon aquaculture has expanded rapidly within the last two decades. It is predicted that salmon aquaculture will continue to expand in order to meet growing seafood demand, since wild capture fisheries have stagnated in terms of production. In Norway, this is a government backed industry with high levels of innovation. 3 The production of the farmed species of salmon is restricted to a few countries like Norway, Chile, the UK and Canada. Nevertheless, this contributes a whopping 85% to worldwide production. Salmon aquaculture is responsible for 15% of the United Kingdom's agricultural economy.
Salmon fishes are a favourite with culinary chefs, and are liked for their orange-coloured meat. The meat is believed to have several health benefits, including reducing obesity, making it extremely valuable as seafood.
Renibacterium salmoninarum i s responsible for causing disease in young Salmonid fish. The infection is commonly known as Bacterial Kidney Disease (BKD), Dee Disease, White Boil Disease or the Corynebacterial Kidney Disease. It is of substantial ecological importance due to its effect on both farmed and wild Salmonids. It was first described under the name of Dee disease (1930), and was identified in the Atlantic salmon (Salmo salar) species in Scotland. A gram positive diplobacillus bacterium that did not grow on any available media was recognized in the kidneys of the diseased fish. This pathogen was found to be exclusive to Salmonids. Ordal and Earp initially cultured the bacteria and identified it as a species of Corynebacterium based on its morphological appearance. Smith concluded that the Dee disease of salmonids in Scotland and the Bacterial Kidney Disease were caused by the same bacterium. 4,5 Government agencies for statutory fish health programmes in the UK (FRS Marine Laboratory, Aberdeen and Cefas, Weymouth) participated in a comparative study for the detection of Renibacterium salmoninarum. FRS conducted additional tests -Indirect fluorescent antibody (IFAT), Gram staining of tissue sections and H&E, a quantitative real-time PCR (qPCR) assay, to detect the elongation factor alpha 1 (ELF) gene of salmonids as well as the Msa2 genes of R. salmoninarum. Isolation using Mueller Hinton with added cysteine (MHCA) resulted in the fish testing to be culture positive. 6 Renibacterium salmoninarum is an intracellular pathogen which multiplies within phagocytes. It is slow growing and is described as a coherent genus. The properties associated with the cell envelope demonstrate the intracellular survival and multiplication characteristics in phagocytic cells.
The cell surface hydrophobicity of R. salmoninarum strains, examined using a salt aggregation method showed the strains to be sticky, auto-agglutinating, and were found to possess a hydrophobic cell surface. Strains with low virulence were found to be non-agglutinating and non-sticky. The adherence of the bacteria to host tissues plays a significant role in their ability to colonize and cause infection. 7 Renibacterium salmoninarum is a sessile, strongly Gram-positive, non acid fast, non spore forming, rod shaped bacterium that usually occurs in pairs (diplococcus). It is approximately 0.3-1.0 µm × 1.0-1.5 µm in length. 8 A slow growing organism, it is one of the earliest known bacterial pathogens of fish (especially salmonids). It is a facultative intracellular parasite, and because of this intracellular nature of infection, the bacterium has the ability to evade the immune response of the host. This makes BKD an especially challenging disease to control. 9 The cell wall of R. salmoninarum is composed primarily of peptidoglycan. The major sugar component is galactose, in addition to N-acetylfucosamine, rhamnose and N-acetylglucosamine. Major amino acids present are alanine, glutamic acid, lysine, and glycine, where the third position of the peptide subunit of the peptidoglycan contains lysine. There is an interpeptide bridge between the lysine and the D-alanine of adjacent peptide subunits that is composed of glycylalanine.
R. salmoninarum is very similar to the Coryneform group of bacteria. 8,10 The bacterium lives inside the pronephric kidneys of salmonid fish. It spreads in two ways -vertically inside the ova, and horizontally in shared water between cohabiting fish. A general consensus is that R. salmoninarum has co-evolved along with its host, the salmonids. 10,11 Diagnostic tools for BKD are PCRs, especially real time quantitative PCRs (RT qPCRs). 11 Control of infection of R. salmoninarum is an arduous process since there are no vaccines available that are sufficiently effective. Antibiotic resistance is also a significant complication with this pathogen. 9,12 In spite of being a potent pathogen causing significant economic damage to the fish industry, specifically to salmon and trout 8 , very little information is available about this bacterial pathogen. It has been studied very little mainly due to the punctilious and slow growing nature of the disease. This makes researching R. salmoninarum difficult. A more thorough study of R. salmoninarum remains necessary to combat the evasive Bacterial Kidney Disease.
Homology Modelling is a computational technique that predicts the structure of a protein sequence by comparing it to the structure of a homologous protein sequence, on the basis of degree of similarity. In this study, the 3D structures of the proteins of Renibacterium salmoninarum were developed using the technique of homology modelling. The protein structures are essential in order to design and develop drugs that can be used to curb the spread of this disease. The lack of availability of protein structures has hindered the understanding of binding specificities of proteins and ligands, which are prerequisites for drug design and development 13,14 . Well Established and recognized databases and tools were utilized for computing structures of potential drug target proteins. This data is important to curb the spread of this disease and also reduce the financial burdens incurred on the fisheries and the pisciculture industry on account of this disease. Homology modelling was carried out using preexisting FASTA format sequences of the proteins involved, along with similar preexisting templates that closely resembled the query sequence, in order to predict the structure of the proteins. It is absolutely necessary for the protein sequences to be available in order to understand the binding specificities of the protein and ligand to yield higher amounts of drugs to curb the spread of the disease. The structures developed in this study can be further exploited to develop drugs, which will be a boon to the fisheries involved in the rearing of salmonoid fishes 13 .

nCBi database
The pre-existing information available on the organism was extracted from the NCBI database. The database compiles data about the organism from multiple databases (https://www. ncbi.nlm.nih.gov/).

sequence retrieval
The amino acid sequences required for this study were collected from the Uniprot Knowledge Database. The database is centralized, reliable and publicly accessible. The sequences with the best match were sorted based on the length of the amino acid chain. The required amino acid chain files were downloaded in the FASTA file format. The downloaded sequences were then saved with their respective accession IDs. This was for enabling easier accessibility in the future. The database can be accessed at: www.uniprot.org.

sequence alignment
The UNIPROT database enabled the compilation of amino acid sequences. The sequences were compared using the BLASTp server. This permits the comparison of the query sequence (also known as the amino acid sequence) with pre-existing protein sequences in the Protein Data Bank (for gauging percentage similarity). The first BLAST was performed for getting values similar to that of the pre-existing sequences of the non-redundant database. This was followed by a second BLAST which was conducted to obtain similar values with the sequences present in the Protein Data Bank (www.rcsb.org). The percentage of similarity was documented for further reference.

structure Prediction
The Homology modelling technique was performed using the SWISS MODEL tool (https://swissmodel.expasy.org/). This online tool helped predict the three-dimensional structures of the selected drug-target proteins. The respective amino acid sequences of the proteins, as well as the templates available in the protein data bank were used to predict the 3D structure of the protein. The sequences obtained from UNIPROT were uploaded as FASTA files and the output (3D structures) were stored in PDB format. The availability and percentage similarity of the templates were the important factors that influenced the developed models. Ramachandran plots were used to analyze the structures. These are graphical plots that help in confirming the predicted structure's accuracy. The most accurate of models were downloaded in PDB format.

Model analysis
Qualitative analysis of the accuracy of the predicted model was conducted using the built-in Ramachandran plot feature in SWISSMODEL. It was expected that the degree angles of all the residues would be found within the Most Favored regions of the Ramachandran plot, as this is what determines the quality of the predicted structure. The residues that were observed outside of this favored region were considered to be unfavorable. These outliers affected the confidence score of the predicted model.

Model submission
The 3D models of the proteins that exhibited good quality Ramachandran plot (confidence score) were submitted to the Protein Model Database (PMDB) (http://srv00.recas. ba.infn.it/PMDB/). The Protein Model Database is a resource that harbors manually built protein models that have been published in research based journals. All the models were uploaded in pdb file format and each entry was given a unique PMDB ID for future reference.

Pathogen & drug target selection
T h e p a t h o g e n R e n i b a c t e r i u m salmoninarum was identified as a potential target for homology modeling studies since there were no known protein structures of the organism present in the NCBI database (although there were 1309 reported protein sequences available). Thus, R. salmoninarum was ideal for computational protein model creation. Among the 1309 protein sequences, non-enzymatic proteins, subunits as well as duplicates were eliminated, leaving a final count of 1249 proteins. These 1249 proteins were then verified on the DrugBank website (www. drugbank.ca) in order to check if these were potential drug targets. Among the 1249 proteins, a total of 412 protein sequences were identified as drug targets. Homology model construction was carried out for these 412 protein sequences. Many similar studies have seen a rise due to the coronavirus outbreak, homology modelling and molecular docking have been key in identifying drug targets and potential medicines. A homology modelling study of TMPRSS2( spike entry priming) was done using a similar pathway.

Constructing homology Models
The 412 selected protein sequences were retrieved from the UniProt database (www.uniprot.  org/). The tool created different models for each protein sequence. Among the created protein sequences modelled, the optimal model was selected using the Ramachandran plot analysis. ramachandran Plot analysis Ramachandran plot analysis was carried out on all the protein sequence models constructed using SWISSModel, via inbuilt options in the website. The protein models which portrayed more than 95% of residues in the favoured regions of the Ramachandran plot were considered to be eligible for structural application. Among the 412 constructed proteins, 143 protein models contained more than 95% of residues in the Ramachandran favoured region and were thus considered for further applications. Fig.1 shows the graphical representation of the most preferred protein model with 100% score in the favoured region and a least preferred model with 80% score within the favoured region.

PMBd id submission
Constructed protein models of all the 143 proteins were then submitted to a public database (PMDB) (http://srv00.recas.ba.infn.it/PMDB/) for public access. A catalogue of the constructed protein models that were submitted to the PMDB database is listed in Table.1. This allows researchers working in this field of study to retrieve and execute further structural bioinformatics analyses specific to this pathogen.

Protein ligand docking
To demonstrate the application of the constructed homology models, protein-ligand study (to screen for an effective antibiotic against R. salmoninarum) was performed. Three known antibacterial drugs i.e., Isoniazid, Trimethoprim and Sulfadiazine, were subjected to protein ligand docking with their reported drug targets from R. salmoninarum (constructed in this study) as well as from other template organisms (as crystal structures from the PDB website, belonging to Mycobacterium tuberculosis, Streptococcus pneumoniae, & Burkholderia cenocepacia respectively). 15,16 As shown in Table.2, the results of the docking study demonstrated that among the three antibiotics, trimethoprim exhibited the highest potential to be an effective inhibitor of the dihydrofolate reductase protein, with a binding energy of -9.06 Kcal/mol, while,Isoniazid exhibited an inconsequential binding energy of -5.36 Kcal/ mol and sulfadiazine exhibited a binding energy of -6.65 Kcal/mol. The graphical representation of the protein-ligand interactions between the antibiotic (drug) and its protein (drug target) is shown in Fig.2. Among the three antibiotics, on account of it exhibiting a high binding energy of -9.06 Kcal/ mol with the formation of 3 hydrogen bonds (Thr-84, Arg-85, Asn-136) and having 13 hydrophobic interactions, Trimethoprim may be proposed as a drug of choice against R. salmoninarum and the Bacterial Kidney Disease (BKD). The protein models built in this study could be applied to other docking studies as well as advanced molecular dynamic simulation studies for the purpose of recognition of effective antibiotic drugs against this specific pathogen/disease.

ConClusion
This study purported to build 3D computational protein structures of the underexploited R. salmoninarum which is responsible for the Bacterial Kidney Disease (BKD) of the commercially viable Salmonid fishes. The results of the study also suggests that among the test antibiotics screened, since trimethoprim demonstrated significant affinity against its drug target, it can be a potent antibiotic effective against R. salmoninarum. This proposal can be investigated further via computational approaches such as MD simulation and/or in-vitro screening of antibiotics against the said pathogen.
The structures thus developed in this study can also be helpful in understanding the pathogenesis of this disease. These can also be used to lay out a roadmap for discerning the effect of inhibition of selective proteins. Those proteins might be efficient antibiotic drug targets, and they might lead to the construction of other antibiotic chemicals that may be better than currently available drugs. This docking study offers a novel grassroots level platform for eventual in-silico as well as wet lab work on Renibacterium salmoninarum specific drug discovery, development and design.