Computational genomics

Computational genomics is the study of deciphering biology from genome sequences using computational analysis. , including both DNA and RNA.

Modern genomics has been defined in many ways:
 * The study of genomes.
 * The molecular characterization of all the genes in a species.
 * The study of genes and their biochemical function in an organism.
 * The comprehensive study of the interactions and functional dynamics of whole sets of genes and their products.
 * The study of the genome and its significance to pathology and disease.

whichever definition we choose, it is impossible for genomics to achieve its fundamental goals without the use of advanced computational tools. The computational aspects of modern genomics go under the name "computational genomics."

Among other topics, computational genomics includes: bio-sequence analysis, gene expression data analysis, phylogenetic analysis, and more specifically pattern recognition and analysis problems such as gene finding, motif finding, gene function prediction, fusion of sequence and expression information, and evolutionary models.

History
Computational genomics began during the 1960s with the research of Margaret Dayhoff and others at the National Biomedical Research Foundation, who first assembled a database of protein sequences. Their research developed a phylogenetic tree that determined the evolutionary changes that were required for a particular protein to change into another protein based on the underlying amino acid sequences. This led them to create a scoring matrix that assessed the likelihood of one protein being related to another.

Beginning in the 1980's, databases of genome sequences began to be recorded, but this presented new challenges in the form of searching and comparing the databases of gene information. Unlike text-searching algorithms that are used on websites such as google or Wikipedia, searching for sections of genetic similarity requires one to find strings that are not simply identical, but similar. This led to the development of the Needleman-Wunsch algorithm, which is a dynamic programming algorithm for comparing sets of amino acid sequences with each other by using scoring matrices derived from the earlier research by Dayhoff. Later, the BLAST algorithm was developed for performing fast, optimized searches of gene sequence databases. BLAST and its derivatives are probably the most widely-used algorithms for this purpose.

The development of computer-assisted mathematics (using products such as Mathematica or Matlab) has helped engineers, mathematicians and computer scientists to start operating in this domain, and a public collection of case studies and demonstrations is growing, ranging from whole genome comparisons to gene expression analysis. . This has increased the introduction of different ideas, including concepts from systems and control, information theory, strings analysis and data mining. It is anticipated that computational approaches will become and remain a standard topic for research and teaching, while students fluent in both topics start being formed in the multiple courses created in the past few years.