Logo
 Home | Sitemap | Contact us | Search | Language
Left Right
Home >> Biotechnology and Genomics >> Bioinformatics and Data Mining - In Silico Biology >> Gene Finders

Gene finders

Once a large amount of DNA has been sequenced, the major task is to understand and detect in these unannotated sequences (it) the protein coding genes, (ii) the structure of these genes (demarcation of control regions, exons, introns and untranslated regions, the UTRs) and (iii) the location of repetitive sequences and their nature. The gene finders are the programmes that identify all open reading frames (ORFs) in unannotated DNA. They use a variety of approaches to locate genes. The major methods for identification of genes can be divided into two types: (i) Signal sensors are the methods that identify the presence of genes by detecting local sites such as start and stop codons, branchpoints, splice sites, promoters and terminators of transcription, polyadenylation sites and ribosomal binding sites. (ii) content sensors are the methods that identify the genes through the use of nucleotide frequencies (relative proportions of GC and AT base pairs) and dependency that help in differentiating coding and non-coding sequences. For instance, coding regions of DNA sequences have a strong three base periodicity, which the non-coding sequences would lack

Most of the gene finding tools at present combine both the signal and the content sensors to identify complete gene structure. Some commonly used programmes for gene finding include GRAIL, GENSCAN, PROCRUSTES, GENEWISE, GENESCAN, PROCRUSTES, SANGER Centre Genefinder and GlimmerM/R (GlimmerM/R is specialized for Arabidopsis and rice, etc)

Known gene in database

Names of genes with their sequences in the database are also available. Unfortunately, some genes have multiple names, and unrelated genes may also often share a common name. Therefore, gene names are providing a hindrance, rather than a help in finding the closest relative of a gene. In view of this, attempts are being made to impose standard names across the board. These efforts are meeting stiff resistance, and approaches that would give unique ID numbers to genes seem unlikely to take off, unless journals enforce the system. But a coalition of leading geneticists may have the answer. The Gene Ontology (GO) Consortium is sidestepping the naming issue by developing software to scan the genomic database and link related genes to one another by using the names, which consistently describe their function, regardless to what the genes are called

Program cpglot

This program released in September 2001 is available on EMBOSS home page and plots CpG rich areas. CpG refers to a cytosine (C) residue immediately followed by a guanine (G) in the genomic sequences and a region rich in CpG island. These CpG islands are resistant to methylation and tend to be associated with genes which are frequently switched on. Frequencies of CpG islands have also been used for predicting the number of genes in a genome and for several other purpose including taxonomic clustering

 

Left Right