Gene finders
Once a large amount of DNA has been sequenced, the major task is to understand and detect in these unannotated sequences (it) the protein coding genes, (ii) the structure of these genes (demarcation of control regions, exons, introns and untranslated regions, the UTRs) and (iii) the location of repetitive sequences and their nature. The gene finders are the programmes that identify all open reading frames (ORFs) in unannotated DNA. They use a variety of approaches to locate genes. The major methods for identification of genes can be divided into two types: (i) Signal sensors are the methods that identify the presence of genes by detecting local sites such as start and stop codons, branchpoints, splice sites, promoters and terminators of transcription, polyadenylation sites and ribosomal binding sites. (ii) content sensors are the methods that identify the genes through the use of nucleotide frequencies (relative proportions of GC and AT base pairs) and dependency that help in differentiating coding and non-coding sequences. For instance, coding regions of DNA sequences have a strong three base periodicity, which the non-coding sequences would lack


