Logo
 Home | Sitemap | Contact us | Search | Language
Left Right
Home >> Plant Biotechnology and Genomics >>Whole Genome Sequencing and Functional Genomics >> Silico Discovery of Genes

Silico discovery of genes
Important computational methods for automatic identification of protein coding regions in genomic DNA sequences have been developed recently. This field of research is described as in silico discovery of genes and its use is increasing with the increase in cDNA and genomic sequences in databases. Improvements in this field of functional genomics involve development of new softwares and a better understanding of the mechanisms of transcription and translation. Several algorithms that are now available for annotation of available genomic sequences, have been successfully used for assigning functions to genomic sequences of Arabidopsis and rice.

Sequence patterns in gene
The annotation process, involving study of sequence pattern, includes identification of (i) start and stop codons of each gene; (ii) regulatory sequences around each gene; (iii) the position and size of the several introns in each gene (in Arabidopsis, on an average, each gene has 4-5 introns), and (iv) sequences that are critical for transcription, RNA splicing and translation (promoter, branch point, polyadenylation site, ribosome binding site, etc.) If all the above sequences are identified perfectly, than the protein coding region can be identified simply by removing the introns, concatenating all the exons, and reading off the protein sequence from start to stop. A number of gene finding softwares (GlimmerM, GeneScan, GeneMark.hmm) are available.

However, for every species, new gene finders may generally be needed (for details consult Plant Molecular Biology Vol. 48 1, 2; 2002).

Sequence similarity
A reliable method for identification of genes involves searching in genomic sequences of a particular species for a sequence, which is homologous to a sequence of known function in another species. For this purpose, genomic sequences, protein sequences and ESTs (expressed sequence tags) that are available in the databases are extensively used. Such methods of gene finding by homology are often much more, accurate than the de novogene finding by computational methods, involving studies of sequence patterns (discussed above). However, all genes cannot be studied by homology search, since about 30% of genes in a species may be such, whose homologous sequences with known functions may not be available yet. In these cases, de novo

 

Left Right