HPC-ILRI
HPCServices Services Projects Projects Documents
HPC
Genomics


Projects

Theileria parva genomics

Theileria parva is a tick-transmitted intracellular protozoan parasite causing a lymphoproliferative disease in cattle known as East Coast fever. T. parva is unique among protozoa in that the schizont stage of the life cycle, which is responsible for pathology in cattle, transforms the lymphocytes of the bovine host to a leukemia like-state, and also co-ordinates it's cell cycle (replication and mitosis) with that of the bovine host cell, so that parasite and host divide in synchrony. The ILRI Bioinformatics group has been intimately involved in the T. parva genome project, lead by The Institute for Genomic Research (TIGR), and was responsible for the annotation of the genome (Gardner et al., 2005). The genome sequence is being used for the development of a new generation vaccine against East Coast fever (Graham et al., 2005) and is providing an insight into the complexities of host parasite interactions. The group was also involved in the comparative study between T. parva and Theileria annulata (Pain et al., 2005)

MPSS
Figure 1.Large-scale synteny between T. annulata and T. parva chromosomes.

Gardner MJ. et al. (2005) Genome sequence of Theileria parva, a bovine pathogen causing a lymphoproliferative disease. Science 309:134-137.
Pain A. et al. (2005) Genome of the host-cell transforming parasite Theileria annulata compared with T. parva. Science 309:131-133.
Graham SP. et al. (2005). Theileria parva candidate vaccine antigens recognized by immune bovine cytotoxic T lymphocyes. Submitted.

Back...

Theileria parva functional genomics

Genome-wide transcription data are important for understanding organism biology in a systems context and when interfaced with complete genome sequences enable analysis of transcription in relation to genome organization. Several techniques are routinely used for analysis of transcriptomes. These include microarrays based on hybridization, and serial analysis of gene expression (SAGE), which provides quantitative information from relatively abundant transcripts, based on 3' signatures from cDNA. A powerful new high throughput method for transcriptome analysis is Massively Parallel Signature Sequencing (MPSS), a technique that improves on the SAGE concept by using novel amplification and sequencing technologies to increase the level of sequence coverage. MPSS allows detection of transcripts expressed at very low levels. We employed MPSS to analyze the transcriptome of T. parva and annotated signatures derived from RNA of the schizont stage using the recently determined genome sequence (Bishop et al., 2005).

MPSS
Figure 1. Distribution of sense transcripts within T. parva chromosomes1-4.

Bishop R. et al. (2005) Analysis of the transcriptome of the protozoan Theileria parva using MPSS reveals that the majority of genes are transcriptionally active in the schizont stage. Nucleic Acid Res. 25:5503-5511.

Back...

Structural bioinformatics approach to Theileria parva genome annotation

Understanding the three-dimensional structure of a protein is the key to determining its functionality. Knowing the amino acid composition of a protein is not useful unless researchers understand how certain motifs fold. Initial analysis of the T. parva genome has shown that 61% of predicted genes have no significant similarity to previously known sequences, even to proteins of related protozoa such as Plasmodium, and therefore have no predicted function. Experimental determination of protein structure is expensive and remains difficult. Structural bioinformatics and modelling of protein tertiary structure is therefore a potentially attractive route to improved understanding of the biology of host cell transformation and other aspects of the biology.

An initial analysis of all the T. parva predicted ORFs were performed using a threading algorithm – THREADER (Jones et al., 1992). Threading is one of the very few methods available which can predict the fold for a protein in the absence of an evolutionary relationship. Fold assignment and alignment were achieved by threading the protein sequence through each of the structures in a library of all known folds. Each sequence-structure alignment was assessed by the energy of a corresponding coarse model potentially providing clues to function of predicted proteins without sequence similarity to ORFs in the database. An analysis focusing on the ORFs at the telomeric ends of chromosome 1 of T. parva identified a predicted hypothetical protein containing a signal peptide that may mimic a transcription factor fold. This result is being confirmed.

Using the HPC facility at ILRI, in collaboration with Dr. Richard Bonneau at the Institute of Systems Biology in Seattle, we are exploring ab initio protein structure prediction methods using a software package, Rosetta (Simons et al., 1997). Rosetta, developed at the University of Washington, combines the Rosetta ab initio structure prediction method with Nuclear Magnetic Resonance (NMR) experimental data for rapid backbone structure determination and predicts the three-dimensional structure of a folded protein from its linear sequence of amino acids. Rosetta is currently one of the best protein structure prediction methods available.


Figure 1. Comparison of native protein structure and Rosetta predicted structure.

Jones DT. et al. (1992) A new approach to protein fold recognition. Nature. 358:86-89.
Simons KT et al. (1997) Assembly of protein tertiary structures from fragments with similar local sequences using simulate annealing and Bayesian scoring functions. J Mol Biol 268:209-25.

Back...

Vector genomics – Identification of tick salivary gland secreted proteins

Traditional pair wise sequence alignment methods can be used to assign folds to sequences with obvious evolutionary relationships to a known structure. Generally for sequences with identities >30% fast sequence-searching methods such as FASTA and BLAST are fairly capable at detecting related proteins by scoring pair wise comparison. However, when sequence identities fall below 30%, conventional pair wise sequence comparison methods fail to detect relationships. Thus, the accurate annotation of genes that encodes proteins with low sequence identity to any known protein remains problematic.

A threading method (Jones et al., 1992) is used to recognise pairs of proteins that have no obvious similarities in sequence, yet have similar folds. This method for fold recognition can be divided into 2 stages:

  1. Given a target sequence and a template protein structure, a sequence-structure alignment is made using a sequence profile method.
  2. Calculation of pair potential and solvation terms.

The evaluation function used in thus method is principally based on a set of pair wise potentials of mean force, determined by a statistical analysis of highly resolved protein X-ray structures and the application of the inverse Boltzmann equation. In addition to the pair wise potentials, a solvation potential is also used.

Jones DT. et al. (1992) A new approach to protein fold recognition. Nature. 358:86-89.
Lambson B. et al. (2005). Identification of candidate sialome components expressed in tick salivary glands using secretion signal complementation in mammalian cells. Insect Molecular Biology 14:403-414.

Back...

Vector genomics – Application of structural fold prediction software to improve the annotation of gene sequences in Rhipicephalus appendiculatus ticks.

Analysis of expressed genes in uninfected and Theileria parva infected salivary glands of four-day fed female adult Rhipicephalus appendiculatus ticks, identified R. appendiculatus homologs of Boophilus microplus heme-lipoproteins that are major components of hemolymph that bind and transport heme and lipids (Nene et al., 2004).

The R. appendiculatus homologs contained N-terminal peptide sequences identical or similar to those derived from biochemically purified B. microplus 103 kDa (ApoHeLP-A) and 92 kDa (ApoHeLP-B) subunits of heme-lipoprotein. Although the B. microplus HeLP proteins contain both heme and lipid binding domains, the R. appendiculatus homologs were predicted to contain only lipid-binding domains as detected by searching CDD, the conserved domain database. A more rigorous, computationally intensive Threader algorithm (Jones et al., 1992) confirmed that the identified lipid-binding domains are similar to those present on lipovitellin.

Nene V. et al. (2004) Insect Biochemistry and Molecular Biology 34:1117–1128.
Jones DT. et al. (1992) A new approach to protein fold recognition. Nature. 358:86-89.

Back...

Generation Challenge Programme

Farmers in the developing world face agricultural challenges far different from their counterparts in industrialized countries. The destruction wreaked on their crops by drought, pest and disease infestations, and low soil fertility is exacerbated by their lack of resources, which puts irrigation, fertilizers, and pesticides beyond their reach.

These production constraints often represent the difference between healthy families and hungry families. The Generation Challenge Programme aims to bridge that gap by using advances in molecular biology and harnessing the rich global stocks of crop genetic resources to create and provide a new generation of plants that meet these farmers' needs.

The ILRI HPC facility is unofficially involved in Subprogram 4, “Genetic Resources, Genomic, and Crop Information Systems”, of the Generation Challenge Programme, by making available its HPC computing resources through the CGIAR HPC grid.

Back...

All Rights Reserved 2005. http://hpc.ilri.cgiar.org Design by ILRI