Logo
 Home | Sitemap | Contact us | Search | Language
Left Right
Home >> Plant Biotechnology and Genomics >>Whole Genome Sequencing and Functional Genomics >> EST Contigs and Unigene Sets

EST Contigs and unigene sets
Although extensive data on ESTs for a variety of plants are available in the databases, there seems to be considerable redundancy for many gene transcripts. Therefore, available ESTs have been assembled in contiguous overlapping clusters, which have been described as contiguous. The ESTs which appear as singles and which can not be assembled in contigs are described as singletons. A combined set of contigs and singletons is described as a unigene set, which represents the minimum number - of genes, although rarely same gene may be a part of two contigs, when an overlapping EST between these two contigs is missing in the database. These unigene sets have been constructed in many species including Arabidopsis, tomato, rice, sorghum, bread wheat.

Annotation of a unigene set is possible by matching the unigene sequences with either the annotated proteome of Arabidopsis (http://www.Arabidopsis.org) using BLASTx or by matches against the GenBank protein database (http://www.ncbLnim.nih.gov/Database/ index.html). This annotation allows classification of unigenes into role classes (r.c.), of which the following four classes contain the largest number of unigenes: metabolism (r.c.1), transcription (r.c.4), cellular organization (r.c,.30) and cellular communication/signal transduction (r.c.10).

A comparison of annotated EST contigs in tomato with those of Arabidopsis (which are fairly diverse) also allowed identification of a set of 1000 unigenes that were described as conserved ortholog set (COS). It is anticipated that this set will be conserved over very divergent species and will prove useful in comparative mapping/genomics studies over a wide spectrum of plant species.

 

Left Right