|

Theileria parva genomics
Theileria parva
is
a tick-transmitted intracellular protozoan parasite causing a
lymphoproliferative disease in cattle known as East Coast fever. T. parva is unique among protozoa
in that the schizont stage of the life cycle, which is responsible for
pathology in cattle, transforms the lymphocytes of the bovine host to a
leukemia like-state, and also co-ordinates it's cell cycle (replication
and mitosis) with that of the bovine host cell, so that parasite and
host divide in synchrony. The ILRI Bioinformatics group has been intimately involved in the T. parva genome project, lead by The Institute for Genomic
Research (TIGR), and was responsible for the annotation of the genome (Gardner et al., 2005).
The genome sequence is being used for the development of a new generation vaccine against East Coast fever (Graham et al., 2005) and is providing an insight into the
complexities of host parasite interactions. The group was also involved in
the comparative study between T. parva and Theileria annulata (Pain et al., 2005)

Figure 1.Large-scale synteny between T. annulata and T. parva chromosomes.
Gardner MJ. et al. (2005) Genome sequence of Theileria parva, a bovine
pathogen causing a lymphoproliferative disease. Science 309:134-137.
Pain A. et al. (2005) Genome of the host-cell transforming parasite Theileria
annulata compared with T. parva. Science 309:131-133.
Graham SP. et al. (2005). Theileria parva candidate vaccine antigens recognized by immune bovine cytotoxic T lymphocyes. Submitted.
Back...
Theileria
parva functional genomics
Genome-wide transcription data are important for understanding organism biology in a systems context and when interfaced with complete genome sequences enable
analysis of transcription in relation to genome organization. Several techniques are
routinely used for analysis of transcriptomes. These include microarrays based on
hybridization, and serial analysis of gene expression (SAGE), which provides quantitative
information from relatively abundant transcripts, based on 3' signatures from cDNA.
A powerful new high throughput method for transcriptome analysis is Massively Parallel
Signature Sequencing (MPSS), a technique that improves on the SAGE concept by using
novel amplification and sequencing technologies to increase the level of sequence coverage. MPSS allows detection of transcripts expressed at very low levels.
We employed MPSS to analyze the transcriptome of T. parva and annotated signatures
derived from RNA of the schizont stage using the recently determined genome sequence
(Bishop et al., 2005).

Figure 1. Distribution of sense transcripts within T.
parva chromosomes1-4.
Bishop R. et al. (2005)
Analysis of the transcriptome of the protozoan Theileria parva using MPSS reveals that the majority of genes are transcriptionally active in the schizont stage. Nucleic Acid Res. 25:5503-5511.
Back...
Structural
bioinformatics approach to Theileria
parva genome annotation
Understanding the three-dimensional structure of a
protein is the key to determining its functionality. Knowing the amino
acid composition of a protein is not useful unless researchers
understand how certain motifs fold. Initial analysis of the T. parva genome has shown that 61%
of predicted genes have no significant similarity to previously known
sequences, even to proteins of related protozoa such as Plasmodium, and
therefore have no predicted function. Experimental determination of
protein structure is expensive and remains difficult. Structural
bioinformatics and modelling of protein tertiary structure is therefore
a potentially attractive route to improved understanding of the biology
of host cell transformation and other aspects of the biology.
An initial analysis of all the T. parva predicted ORFs were
performed using a threading algorithm – THREADER (Jones et al., 1992).
Threading is one of the very few methods available which can predict
the fold for a protein in the absence of an evolutionary relationship.
Fold assignment and alignment were achieved by threading the protein
sequence through each of the structures in a library of all known
folds. Each sequence-structure alignment was assessed by the energy of
a corresponding coarse model potentially providing clues to function of
predicted proteins without sequence similarity to ORFs in the database.
An analysis focusing on the ORFs at the telomeric ends of chromosome 1
of T. parva identified a
predicted hypothetical protein containing a signal peptide that may
mimic a transcription factor fold. This result is being confirmed.
Using the HPC facility at ILRI, in collaboration with
Dr. Richard Bonneau at the Institute of Systems Biology in Seattle, we
are exploring ab initio protein
structure prediction methods using a software package, Rosetta (Simons
et al., 1997). Rosetta, developed at the University of Washington,
combines the Rosetta ab initio structure prediction method with Nuclear
Magnetic Resonance (NMR) experimental data for rapid backbone structure
determination and predicts the three-dimensional structure of a folded
protein from its linear sequence of amino acids. Rosetta is currently
one of the best protein structure prediction methods available.

Figure 1.
Comparison of native protein structure and Rosetta predicted structure.
Jones DT. et al. (1992) A new approach to
protein fold recognition. Nature. 358:86-89.
Simons KT et al. (1997) Assembly of protein
tertiary structures from fragments with similar local sequences using
simulate annealing and Bayesian scoring functions. J Mol Biol
268:209-25.
Back...
Vector
genomics –
Identification
of tick salivary gland secreted proteins
Traditional pair wise sequence alignment methods can be
used to assign folds to sequences with obvious evolutionary
relationships to a known structure. Generally for sequences with
identities >30% fast sequence-searching methods such as FASTA and
BLAST are fairly capable at detecting related proteins by scoring pair
wise comparison. However, when sequence identities fall below 30%,
conventional pair wise sequence comparison methods fail to detect
relationships. Thus, the accurate annotation of genes that encodes
proteins with low sequence identity to any known protein remains
problematic.
A threading method (Jones et
al., 1992) is used to
recognise pairs of
proteins that have no obvious similarities in sequence, yet have
similar folds. This method for fold recognition can be divided into 2
stages:
- Given a target sequence and a template protein
structure, a sequence-structure alignment is made using a sequence
profile method.
- Calculation of pair potential and solvation terms.
The evaluation function used in thus method is
principally based on a
set of pair wise potentials of mean force, determined by a statistical
analysis of highly resolved protein X-ray structures and the
application of the inverse Boltzmann equation. In addition to the pair
wise potentials, a solvation potential is also used.
Jones DT. et al. (1992)
A new approach to protein fold recognition. Nature. 358:86-89.
Lambson B. et al. (2005). Identification of candidate
sialome components expressed in tick salivary glands using secretion signal
complementation in mammalian cells. Insect Molecular Biology 14:403-414.
Back...
Vector
genomics – Application of structural fold prediction software to
improve the annotation of gene sequences in Rhipicephalus appendiculatus ticks.
Analysis of expressed genes in uninfected and Theileria
parva infected salivary glands of four-day fed female adult Rhipicephalus
appendiculatus ticks, identified R. appendiculatus homologs of Boophilus
microplus heme-lipoproteins that are major
components of hemolymph that bind and transport heme and lipids (Nene et al.,
2004).
The R.
appendiculatus homologs contained N-terminal peptide sequences
identical or similar to those derived from biochemically purified B.
microplus 103 kDa (ApoHeLP-A) and 92 kDa (ApoHeLP-B) subunits of
heme-lipoprotein. Although the B.
microplus HeLP proteins contain both
heme and lipid binding domains, the R.
appendiculatus homologs were
predicted to contain only lipid-binding domains as detected by
searching CDD, the conserved domain database. A more rigorous,
computationally intensive Threader algorithm (Jones et al., 1992)
confirmed that the identified lipid-binding domains are similar to
those present on lipovitellin.
Nene V. et al.
(2004) Insect Biochemistry
and Molecular Biology 34:1117–1128.
Jones DT. et al. (1992) A new approach
to protein fold recognition. Nature. 358:86-89.
Back...
Generation Challenge
Programme
Farmers in the developing world face agricultural
challenges far different from their counterparts in industrialized
countries. The destruction wreaked on their crops by drought, pest and
disease infestations, and low soil fertility is exacerbated by their
lack of resources, which puts irrigation, fertilizers, and pesticides
beyond their reach.
These production constraints often represent the
difference between healthy families and hungry families. The Generation Challenge Programme
aims to bridge that gap by using advances in molecular biology and
harnessing the rich global stocks of crop genetic resources to create
and provide a new generation of plants that meet these farmers' needs.
The ILRI HPC facility is unofficially involved in
Subprogram 4, “Genetic Resources, Genomic, and Crop Information
Systems”, of the Generation Challenge Programme, by making available
its HPC computing resources through the CGIAR HPC grid.
Back...
|