Sequences and clones for over a million
expressed sequence tagged sites (ESTs) are currently publicly
available. Only a minority of these identified clusters contains
genes associated with a known functionality. One way of gaining
insight into a gene's role in cellular activity is to study
its expression pattern in a variety of circumstances and contexts,
as it responds to its environment and to the action of other
genes. Recent methods facilitate large-scale surveys of gene
expression in which transcript levels can be determined for
thousands of genes simultaneously. In particular, gene expression
microarrays have become an essential tool in such studies.
The two major technologies for expression microarrays are
spotted arrays and oligonucleotide arrays. Spotted arrays
make use of a complex biochemical-optical system to perform
robotic spotting of cDNA probes that represent the genes.
Oligonucleotide arrays, on the other hand, are based on a
photolithographic process, not unlike the one used for making
silicon chips for computers, which allow a very dense packing
of the cDNA probes. In both cases, fluorescent-dyed RNA transcripts
are hybridized to the arrays, and a imaging system is used
to scan the hybridized arrays.
Since transcription control is accomplished
by a method that interprets a variety of inputs, we require
analytical tools for expression profile data that can detect
the types of multivariate influences on decision-making produced
by complex genetic networks. Put more generally, signals generated
by the genome must be processed to characterize their regulatory
effects and their relationship to changes at both the genotypic
and phenotypic levels. Two salient goals of functional genomics
are to screen for the key genes and gene combinations that
explain specific cellular phenotypes (e.g. disease) on a mechanistic
level, and to use genomic signals to classify disease on a
molecular level.
Genomic Signal Processing (GSP) is the engineering
discipline that studies the processing of genomic signals.
Owing to the major role played in genomics by transcriptional
signaling and the related pathway modeling, it is only natural
that the theory of signal processing should be utilized in
both structural and functional understanding. The aim of GSP
is to integrate the theory and methods of signal processing
with the global understanding of functional genomics, with
special emphasis on genomic regulation. Hence, GSP encompasses
various methodologies concerning expression profiles: detection,
prediction, classification, control, and statistical and dynamical
modeling of gene networks. GSP is a fundamental discipline
that brings to genomics the structural model-based analysis
and synthesis that form the basis of mathematically rigorous
engineering.
Application is generally directed towards
tissue classification and the discovery of signaling pathways,
both based on the expressed macromolecule phenotype of the
cell. Accomplishment of these aims requires a host of signal-processing
approaches. These include signal representation relevant to
transcription, such as wavelet decomposition and more general
decompositions of stochastic time series, and system modeling
using nonlinear dynamical systems. The kind of correlation-based
analysis commonly used for understanding pair-wise relations
between genes or cellular effects cannot capture the complex
network of nonlinear information processing based upon multivariate
inputs from inside and outside the genome. Regulatory models
require the kind of nonlinear dynamics studied in signal processing
and control, and in particular the use of stochastic dataflow
networks common to distributed computer systems with stochastic
inputs. This is not to say that existing model systems suffice.
Genomics requires its own model systems, not simply straightforward
adaptations of currently formulated models. New systems must
capture the specific biological mechanisms of operation and
distributed regulation at work within the genome. It is necessary
to develop appropriate mathematical theory, including optimization
for the kinds of external controls required for therapeutic
intervention and approximation theory to arrive at nonlinear
dynamical models that are sufficiently complex to adequately
represent genomic regulation for diagnosis and therapy, and
also not too complex for the amounts of data experimentally
feasible or for the computational limits of existing computer
hardware. |