The Gene

Genes: An Introduction

A gene is a fundamental unit of heredity in living organisms. It refers to segments of DNA or RNA that encode instructions for synthesizing proteins or functional RNA molecules. Genes are essential for life, as they provide the information needed to build and maintain cells, and they pass genetic traits from one generation to the next. While most genetic information is stored in DNA, some organelles like mitochondria contain their own DNA and replicate independently of the cell's nuclear DNA.


Genes dictate a wide range of biological traits. Some traits are immediately observable, such as eye color or limb count, while others are less visible, like blood type, susceptibility to certain diseases, or various biochemical processes essential for life.


In modern genetics, a gene is defined as a region of genomic sequence associated with a unit of inheritance. This includes regulatory regions, transcribed regions, and other functional sequence elements. Colloquially, the term 'gene' is often used interchangeably with 'allele', which represents different versions of a gene. For instance, saying someone has a gene for a trait is less precise than stating they have a specific allele of that gene. Most individuals possess the same gene for a given trait, but different alleles of that gene can lead to variations in the trait. Ultimately, it is the gene itself, not the trait, that is inherited.

Definitions of a Gene and Their Functions


RNA

Genes are fundamental units of heredity, encoded in DNA or RNA, responsible for producing proteins or functional RNA molecules. In many organisms, genes are first transcribed into RNA, which then serves as an intermediary for protein synthesis. In some cases, RNA molecules themselves have functional roles. For instance, ribozymes are RNA molecules with enzymatic capabilities, and microRNA is involved in regulating gene expression. Genes that code for such functional RNA are termed RNA genes.


Certain viruses store their entire genetic information in RNA, bypassing the need for DNA. These RNA viruses can directly utilise their RNA genomes for protein synthesis once they infect a host cell. Conversely, RNA retroviruses, like HIV, must convert their RNA genomes into DNA through reverse transcription before producing proteins. A notable example of RNA-mediated inheritance was discovered in mice in 2006, where a mutation in the Kit gene resulted in white tails, despite offspring having normal Kit genes. This effect was traced to mutated Kit RNA, highlighting the rare but significant role of RNA in inheritance among mammals.


Functional structure of a gene

Genes in most organisms are encoded in long strands of DNA (deoxyribonucleic acid). DNA consists of a backbone made up of a five-carbon sugar, a phosphate group, and four nitrogenous bases: adenine (A), cytosine (C), guanine (G), and thymine (T). DNA typically forms a double helix, where two strands spiral around each other, adhering to base pairing rules: guanine pairs with cytosine through three hydrogen bonds, and adenine pairs with thymine through two hydrogen bonds. The strands are complementary, meaning each strand's sequence dictates the sequence of its partner.


DNA has directionality, with a 3' end featuring a hydroxyl group and a 5' end featuring a phosphate group. This directionality is crucial for cellular processes like DNA replication, which occurs in the 5'-3' direction. The sequence of nucleotides dictates the synthesis of new strands, and this directionality ensures accurate replication and transcription.


Gene expression involves transcribing DNA into RNA, specifically messenger RNA (mRNA), which carries genetic instructions for protein synthesis. RNA is similar to DNA but contains ribose instead of deoxyribose and uracil (U) instead of thymine (T). RNA molecules are usually single-stranded and less stable than DNA. Genes encoding proteins are read in sets of three nucleotides called codons, each specifying an amino acid. The genetic code, nearly universal across all organisms, translates these codons into amino acids during protein synthesis.


Regulatory regions and gene organisation

Genes include regulatory regions that control their expression. The promoter region is crucial as it binds the transcription machinery to initiate gene transcription. Some genes have multiple promoters, leading to variations in RNA transcripts. Enhancers can amplify gene expression by compensating for weak promoters. Regulatory regions are predominantly upstream of the transcription initiation site. In eukaryotes, promoter regions are complex and challenging to identify compared to prokaryotic promoters.


Prokaryotic genes are often organised into operons, groups of functionally related genes transcribed together. Eukaryotic genes are typically transcribed individually and may include introns, non-coding sequences that are removed before translation. Although splicing occurs in prokaryotes, it is less common than in eukaryotes.


Chromosomes

The genome of an organism, which encompasses all its genes, is stored on chromosomes. A chromosome consists of a single, long DNA molecule with numerous genes. Prokaryotes, such as bacteria and archaea, usually have a single circular chromosome, supplemented by plasmids—small DNA circles carrying a few genes, often involved in processes like antibiotic resistance. Eukaryotic genomes are housed in multiple linear chromosomes within the nucleus and associated with histone proteins.


DNA is wrapped around histones, and chemical modifications to these histones regulate gene accessibility. Eukaryotic chromosomes have telomeres, repetitive sequences that protect coding regions from degradation during replication. Telomere shortening is linked to cellular ageing and loss of division capacity. While prokaryotic chromosomes are dense with genes, eukaryotic chromosomes often contain non-coding DNA, previously termed 'junk DNA'. This DNA, though not coding for proteins, may have regulatory functions. Recent research suggests that a significant portion of this non-coding DNA is actively expressed.


Gene Expression

Gene expression involves two key steps: transcription and translation. Transcription converts DNA into mRNA, which is then translated into protein. RNA-coding genes follow the same transcription step but do not undergo translation. Gene expression produces a functional molecule, whether RNA or protein.


Genetic code

The genetic code is a set of rules that translate nucleotide sequences into amino acids. Each gene's nucleotide sequence is read in triplets known as codons. There are 64 possible codons and 20 standard amino acids, leading to some redundancy in the code. This redundancy means multiple codons can specify the same amino acid. The genetic code is nearly universal across organisms.


Transcription

Transcription creates mRNA from DNA. RNA polymerase binds to the gene's promoter, reads the template strand in the 3'-5' direction, and synthesises RNA in the 5'-3' direction. In prokaryotes, transcription occurs in the cytoplasm and can be coupled with translation. In eukaryotes, transcription happens in the nucleus, and the primary transcript undergoes modifications, such as splicing, before leaving the nucleus for translation.


Translation

Translation uses mRNA as a template to build proteins. Ribosomes, consisting of RNA and proteins, facilitate the addition of amino acids to the polypeptide chain. tRNA molecules with anticodons match mRNA codons and deliver the corresponding amino acids. The protein chain is synthesised from the amino terminus to the carboxyl terminus and must fold into its functional three-dimensional structure.


DNA replication and inheritance

DNA replication is essential for cell division, creating identical copies of the genome. DNA polymerases synthesise new strands by reading the template strand, with the process being semi-conservative—each new DNA molecule has one original and one newly synthesised strand. In prokaryotes, binary fission quickly divides the cell, whereas eukaryotic cell division involves a complex cell cycle, including DNA replication in the S phase and chromosome segregation in the M phase.


Inheritance

Molecular inheritance involves passing genetic material from parent to offspring. In asexual reproduction, offspring are genetic clones of the parent. In sexual reproduction, meiosis produces haploid gametes (sperm and eggs), which combine to form a diploid zygote with genetic material from both parents. Genetic recombination during meiosis can shuffle alleles, contributing to genetic diversity. Mendelian principles of inheritance apply, though genetic linkage can affect the assortment of alleles located on the same chromosome.


Mutations

Mutations are changes in DNA sequences that can arise from replication errors or damage. Although DNA repair mechanisms exist, some mutations persist and can be neutral, deleterious, or occasionally beneficial. Mutations passed to the next generation contribute to genetic variation. Variants of genes, or alleles, can result in different traits. The most common allele is the wild type, while rare variants are called mutants.

What is the Significane of the Genome?


Chromosomal Organisation

The genome of an organism or cell encompasses its entire set of genes. In prokaryotes, most genes are located on a single, circular DNA chromosome, while eukaryotes typically have multiple linear DNA strands organised into dense structures known as chromosomes. Genes that are located together on a chromosome in one species might be found on separate chromosomes in another species. Many organisms contain multiple copies of their genome within each cell. Cells or organisms with one copy of each chromosome are termed haploid; those with two copies are diploid; and those with more than two copies are polyploid. The gene copies on chromosomes may not always be identical. In sexually reproducing organisms, one copy is generally inherited from each parent.


Number of genes

Early estimates suggested that the human genome contained between 50,000 and 100,000 genes based on expressed sequence tag data. However, after sequencing the human genome and those of other organisms, it has been determined that relatively few genes are responsible for encoding all the proteins in an organism (~20,000 in humans, mice, and flies; ~13,000 in roundworms; and over 46,000 in rice). These protein-coding sequences constitute only 1-2% of the human genome. A significant portion of the genome is transcribed into non-coding regions, such as introns, retrotransposons, and various non-coding RNAs. The total number of proteins, or the Earth's proteome, is estimated to be around 5 million sequences.


Genetic and genomic nomenclature

The HUGO Gene Nomenclature Committee (HGNC) has established a system for naming human genes, providing each gene with a unique approved name and symbol. All approved symbols are recorded in the HGNC Database. Each symbol is distinct, ensuring that each gene has only one approved symbol. This standardisation aids in the retrieval of electronic data from publications. Symbols are designed to maintain consistency within gene families and can be used across different species, especially in model organisms like mice.


Evolutionary concept of a gene

In his 1966 book, Adaptation and Natural Selection, George C. Williams advocated for a gene-centric view of evolution. He defined a gene as 'that which segregates and recombines with appreciable frequency', implying that even an asexual genome could be considered a gene if it persists through many generations. Richard Dawkins furthered this concept in his books The Selfish Gene (1976) and The Extended Phenotype (1982), proposing that genes are the primary replicators in living systems. According to Dawkins, genes transmit their structure largely intact and can be considered immortal in the form of copies. He suggested that genes should be viewed as the unit of selection and described life as a "river of genes" flowing through time, with organisms acting as temporary survival machines. This perspective accounts for genetic divergence as species separate and evolve independently.


Gene targeting and its implications

Gene targeting refers to techniques used to alter or disrupt genes in mice to study their roles in development, human disorders, ageing, and diseases. Mice with one or more deactivated or nonfunctional genes are termed knockout mice. Since the advent of homologous recombination in embryonic stem cells, gene targeting has become a powerful method for manipulating the mammalian genome. Researchers have produced thousands of mutant mouse strains, with the capability to introduce mutations that can be activated at specific times or in particular cells or organs.


Gene targeting has expanded to include various modifications such as point mutations, isoform deletions, correction of mutant alleles, and large-scale chromosomal alterations. This approach is anticipated to significantly impact research across developmental biology, immunology, neurobiology, oncology, physiology, metabolism, and human diseases. It also holds potential for improving domestic animals and plants, given that gene targeting might be applicable to species with totipotent embryonic stem cells.


Future perspectives

The concept of a gene has evolved considerably. Initially defined as a 'unit of inheritance', it now refers to a DNA-based unit capable of influencing an organism through RNA or protein products. The earlier notion that one gene codes for one protein has been challenged by the discovery of alternative splicing and trans-splicing.


The definition of a gene continues to evolve. Recent discoveries in RNA-based inheritance in mammals and evidence suggesting that regulatory regions can be distant from the coding sequence challenge traditional views. For example, research by Spilianakis and colleagues showed that the promoter region of the interferon-gamma gene on chromosome 10 and the regulatory regions of the T(H)2 cytokine locus on chromosome 11 can come into proximity in the nucleus, possibly for coordinated regulation.


The clear delineation of genes is also being re-evaluated. Evidence suggests that proteins can be formed from exons across distant regions or different chromosomes. This has led to a revised, albeit provisional, definition of a gene as a union of genomic sequences encoding a coherent set of potentially overlapping functional products. This new definition categorises genes by their functional products, whether proteins or RNAs, rather than by specific DNA loci, thus including all regulatory elements of DNA as gene-associated regions.

 

Back to top