Download (direct link):
The gene neighbourhood method is yet another computation-based method. It depends upon the observation that if two genes are consistently found side by side in the genome of several different organisms, they are likely to be functionally linked.
Knock-out animal studies, in contrast to the above methods, are dependent upon phenotype observation. The approach entails the generation and study of mice in which a specific gene has been deleted. Phenotypic studies can sometimes yield clues as to the function of the gene knocked out.
Although sequence data provides a profile of all the genes present in a genome, it gives no information as to which genes are switched on (transcribed) and, hence, which are functionally active at any given time/under any given circumstances. Gene transcription results in the production of RNA, either messenger RNA (mRNA; usually subsequently translated into a polypeptide) or ribosomal or transfer RNA (rRNA or tRNA, which have catalytic or structural
functions). The study of under which circumstances an RNA species is expressed/not expressed in the cell/organism can provide clues as to the biological function of the RNA (or, in the case of mRNA, the function of the final polypeptide product). Furthermore, in the context of drug lead/target discovery, the conditions under which a specific mRNA is produced can also point to putative biopharmaceuticals/drug targets. For example, if a particular mRNA is only produced by a cancer cell, that mRNA (or, more commonly, its polypeptide product) may represent a good target for a novel anti-cancer drug.
Levels of RNA (usually specific mRNAs) in a cell can be measured by well-established techniques such as Northern blot analysis or by polymerase chain reaction (PCR) analysis. However, the recent advent of DNA microarray technology has converted the identification and measurement of specific mRNAs (or other RNAs if required) into a ‘high-throughput’ process. DNA arrays are also termed ‘oligonucleotide arrays’, ‘gene chip arrays’ or simply, ‘chips’.
The technique is based upon the ability to anchor nucleic acid sequences (usually DNA-based) on plastic/glass surfaces at very high density. Standard griding robots can put on up to 250000 different short oligonucleotide probes or 10000 full-length complementary DNA (cDNA) sequences per cm2 of surface. Probe sequences are generally produced/designed from genome sequence data and hence chip production is often referred to as ‘downloading the genome on a chip’. RNA can be extracted from a cell and probed with the chip. Any complementary RNA sequences present will hybridize with the appropriate immobilized chip sequence (Figure 2.2). Hybridization is detectable as the RNA species are first labelled. Hybridization patterns obviously yield critical information regarding gene expression.
Figure 2.2. Generalized outline of a gene chip. In this example, short oligonucleotide sequences are attached to the anchoring surface (only the outer rows are shown). Each probe displays a different nucleotide sequence, and the sequences used are usually based upon genome sequence information. The sequence of one such probe is shown as AGGCA. By incubating the chip, e.g. with total cellular mRNA, under appropriate conditions, any mRNA with a complementary sequence (UCCGU in the case of the probe sequence shown) will hybridize with the probes. In reality, probes will have longer sequences than the one shown above
THE DRUG DEVELOPMENT PROCESS 49
While virtually all drug targets are protein-based, the inference that protein expression levels can be accurately (if indirectly) detected/measured via DNA array technology is a false one, because:
• mRNA concentrations do not always directly correlate with the concentration of the mRNA-encoded polypeptide.
• a significant proportion of eukaryote mRNAs undergo differential splicing and, therefore, can yield more than one polypeptide product (Figure 2.3).
Additionally, the cellular location at which the resultant polypeptide will function often cannot be predicted from RNA detection/sequences nor can detailed information regarding how the polypeptide product’s functional activity will be regulated (e.g. via post-translational mechanisms such as phosphorylation, partial proteolysis, etc.). Therefore, protein-based drug leads/targets are often more successfully identified by direct examination of the expressed protein complement of the cell, i.e. its proteome. Like the transcriptome (total cellular RNA content) and in contrast to the genome, the proteome is not static with changes in cellular
Figure 2.3. Differential splicing of mRNA can yield different polypeptide products. Transcription of a gene sequence yields a ‘primary transcript’ RNA. This contains coding regions (exons) and non-coding regions (introns). A major feature of the subsequent processing of the primary transcript is ‘splicing’, the process by which introns are removed, leaving the exons in a contiguous sequence. Although most eukaryotic primary transcripts produce only one mature mRNA (and hence code for a single polypeptide) some can be differentially spliced, yielding two or more mature mRNAs. The latter can therefore code for two or more polypeptides. E = exon; I = intron