The size distributions of all known coding and noncoding DNA sequences are studied in all human chromosomes. In a unified approach, both introns and intergenic regions are treated as noncoding regions. The distributions of noncoding segments Pnc (S) of size S present long tails Pnc (S) ∼ S-1- μnc, with exponents μnc ranging between 0.71 (for chromosome 13) and 1.2 (for chromosome 19). On the contrary, the exponential, short-range decay terms dominate in the distributions of coding (exon) segments Pc (S) in all chromosomes. Aiming to address the emergence of these statistical features, minimal, stochastic, mean-field models are proposed, based on randomly aggregating DNA strings with duplication, influx and outflux of genomic segments. These minimal models produce both the short-range statistics in the coding and the observed power law and fractal statistics in the noncoding DNA. The minimal models also demonstrate that although the two systems (coding and noncoding) coexist, alternating on the same linear chain, they act independently: the coding as a closed, equilibrium system and the noncoding as an open, out-of-equilibrium one.
|Journal||Physical Review E - Statistical, Nonlinear, and Soft Matter Physics|
|Publication status||Published - May 3 2007|
ASJC Scopus subject areas
- Statistical and Nonlinear Physics
- Statistics and Probability
- Condensed Matter Physics