Logo
 Home | Sitemap | Contact us | Search | Language
Left Right
Home >> Plant Biotechnology and Genomics >>Whole Genome Sequencing and Functional Genomics >> Gene Finding in Arabidopsis and Rice

Gene finding in Arabidopsis and rice
The method for annotation of genomic sequence, outlined above are already being used for Arabidopsis, for which whole genome has been sequenced and also in rice, for which at least four draft genome sequences became available by April 2002 and more are expected in future. However, it has been recognized that the gene finders that were made for Arabidopsisand used by 'Arabidopsis Functional Genomics Consortium' (AFGC), do not work well in rice, so that new gene finders had to be designed for annotation of rice genome sequences.

Three gene finders that have been used for gene prediction in Arabidopsis include (i) GlimmerM, (ii) GENESCAN +/- (a version of GENSCAN), trained for Arabidopsis,and (iii) GeneMark.hmm. These programs had already allowed identification of more than 1100 sequences, each containing only one gene (till early 2002).

During 2000-2002, gene finders, specifically suitable for rice, were created. These included new rice versions of GlimmerM, GeneMark.hmm and FGENES (developed in 1995), which led to improvement in annotation process for rice genome. These gene finders may need to be retrained as more data on annotated genes accumulate for rice genome.

Efforts are also being made to combine the results of different gene finders listed above. Particularly, such an effort is useful, when different gene finders give different predictions. For instance, often a particular exon is correctly predicted by only one of the three gene finders.

A combining algorithm was actually designed for Arabidopsis and was applied to the output of the three gene finders, leading to higher level of accuracy. Despite these improvements in gene finding algorithms in the year 2002, even the best systems predicted a complete gene correctly half of the time. Further improvement is expected during 2002-2010, when the number of genes with known functions grow and new algorithms are developed.

Selected examples of public databases and resources used for functional genomcis in plants

(i) Single organisms
TAIR (Arabidopsis); MaizeDB, ZmDB (maize); soybase (soybeans); AlfaGenes (alfalfa); BeanGeans (Phaseolus and Vigna); SorghumDB (Sorghum bicolor).

(ii) Multiple organisms
GeneBank (all); SwissProt/ trEMBL (all); CropSeqDB (crop species); COGS (completely sequenced organisms); Protein Data Bank (all); DDBJ (all); EMBL (all); SolGenes (solanaceous species); ARS Genome Resource (Arabidopsis, barley, Brassica,forage grasses, millet and rice); TIGR Gene Indices (many organisms; not all); GrameneDB (rice and other grasses); GrainGenes (wheat, rye, oat, barley and sugarcane).

 

Left Right