The genes of all cells are composed of DNA. Proteins serve as structural components as well as enzymes within cells but the genes contain the blueprints for each protein and the program for controlling the production of proteins. Genes are transcribed to produce complementary molecules of mRNA (messenger RNA) and the mRNA is translated to proteins. There is a one to one correspondence (almost) between genes and proteins. Proteins perform the work of cells such as energy production, reaction catalysis, inter-cellular signaling, transcription and translation, cell reproduction, etc. All cells of an organism contain the same DNA. The level of production of the each of the types of proteins specifies the state of a cell. This state is determined by spatial and temporal variables such as tissue location and extra-cellular stimuli. Level of production of a protein is determined primarily by level of transcription of the corresponding gene into mRNA. This picture seems to be at the core of a universal story of life!
With the recent availability of DNA sequence data, proteomics data and development of tools for whole-genome assays (e.g., gene expression arrays), it has become possible to understand the basic biology of the cells, identification and function of the genes and how a common/universal theme varies over all life.
The greatest hurdles to the effective development and use of new tools for the "post-genomic informatics" are problems of mathematics and statistics. There are difficult problems of combinatorial mathematics, statistics, modeling and algorithm design.
There are challenging problems of how to elucidate genetic networks based on time-sequenced gene expression data. There are important problems of how to classify cells based on expression pattern and how to develop diagnostic disease classifications systems. Because of the high dimensionality of the data, there are many challenging problems of multiplicity and multivariate analysis that must be addressed.