Books
in black and white
Main menu
Share a book About us Home
Books
Biology Business Chemistry Computers Culture Economics Fiction Games Guide History Management Mathematical Medicine Mental Fitnes Physics Psychology Scince Sport Technics
Ads

Bioinformatics. A Practical Guide to the Analysis of Genes and Proteins - Baxevanis A.D.

Baxevanis A.D. Bioinformatics. A Practical Guide to the Analysis of Genes and Proteins - New York, 2001. - 493 p.
ISBN 0-471-22392-1
Download (direct link): bioinformaticsapractic2001.pdf
Previous << 1 .. 129 130 131 132 133 134 < 135 > 136 137 138 139 140 141 .. 251 >> Next

Multiple Protein Alignment From DNA Sequences
Although most DNA sequences will have translations represented in the EMBL-TrEMBL or NCBI-GenPept databases, this is not true of single-pass EST sequences. Because EST data are accumulating at an exponential pace, an automatic method of extracting useful protein information from ESTs has been developed. In brief, the ProtEST server (Cuff et al., 1999) searches EST collections and protein sequence databases with a protein query sequence. EST hits are assembled into species-specific contigs, and an error-tolerant alignment method is used to correct probable sequencing errors. Finally, any protein sequences found in the search are multiply aligned with the translations of the EST assemblies to produce a multiple protein sequence alignment. The JPred server (version 7.3) will generate a multiple protein sequence alignment when presented with a single protein sequence by searching the SWALL protein sequence database and building a multiple alignment. The JPred alignments are a good starting point for further analysis with more sensitive methods.
TOOLS TO ASSIST THE ANALYSIS OF MULTIPLE ALIGNMENTS
A multiple sequence alignment can potentially consist of several hundred sequences that are 500 or more amino acids long. With such a volume of data, it can be difficult to find key features and present the alignments in a form that can be analyzed by eye. In the past, the only option was to print out the alignment on many sheets of paper, stick these together, and then pore over the massive poster with colored highlighter pens. This sort of approach can still be useful, but it is rather inconvenient! Visualization of the alignment is an important scientific tool, either for analysis or for publication. Appropriate use of color can highlight positions that are either identical in all the aligned sequences or share common physicochemical properties. ALSCRIPT (Barton, 1993) is a program to assist in this process. ALSCRIPT takes a multiple sequence alignment and a file of commands and produces a file in
TOOLS TO ASSIST THE ANALYSIS OF MULTIPLE ALIGNMENTS
223
К jG L R A Л G G II L L F ODD К V S G L Q LLKDE ■J vJ 1 DV PPMRH S i N Г С □ L E V 1 T NG К •' К S
G L К i 1 1 G 1 1 T L L L QDL V QG L ATRDGGRT * 1 T1V OP V E G A F V V rj L О к Q HI L S NG R 1 T N E К FKN
GSGQ 'I G N L 1 T L L Q qd: L P 0 L 0 Q L 1 V К 0 A T VJ 1 A V OP 1 P T A F V V N L L T L К V F E G
<3 V V A * M S Y 1 T 1 L V PNF VQG L Q V F К D G H w Y DV К Y 1 P N A L 1 V H r □ □ V E 1 t S NG К Y К S
GT G P P 1 V T 1 L H QDP V S G L О V С S N 0 Q о Y S 1 P P N P E н Г V 1 N D T F T s L T NG 1 Y К G
ЙМАР T L 5 M V T L 1 □ QTPCANGF VS L О A E V G G A ¥ T D L P Y R P D A V L V F С 7 A 1 A T L V T GG О V К A
RMGP I L S 1 1 T L V H QTACANGFVS L Q CEVDG t F V D L P T L P G AMW F С .4 A V О T L A I GJG К V К A
S F E M F ' V L 1 TVL V lOSN VON L Q V E T A A QD 1 E A D DT GY .L_L N Q. S Y A H L T NN Y •' К A
lilt]
Figure 9.3. Example output from the program ALSCRIPT (Barton, 1993). Details can be found within the main text.
PostScript format suitable for printing out or viewing with a utility such as ghostview. Figure 9.3 illustrates a fragment of ALSCRIPT output (the full figure can be seen in color in Roach et al., 1995). In this example, identities across all sequences are shown in white on red and boxed, whereas positions with similar physicochemical properties are shown black on yellow and boxed. Residue numbering according to the bottom sequence is shown underneath the alignment. Green arrows illustrate the location of known ^-strands, whereas a-helices are shown as black cylinders. Further symbols highlight specific positions in the alignment for easy cross-referencing to the text. ALSCRIPT is extremely flexible and has commands that permit control of font size and type, background coloring, and boxing down to the individual residue. The program will automatically split a large alignment over multiple pages, thus permitting alignments of any size to be visualized. However, this flexibility comes at a price. There is no point-and-click interface, and the program requires the user to be familiar with editing files and running programs from the command line. The ALSCRIPT distribution includes a comprehensive manual and example files that make the process of making a useful figure for your own data a little easier.
Subalignments—AMAS
ALSCRIPT provides a few commands for calculating residue conservation across a family of sequences and coloring the alignment accordingly. However, it is really intended as a display tool for multiple alignments rather than an analysis tool. In contrast, AMAS (Analysis of Multiply Aligned Sequences; Livingstone and Barton, 1993) is a program for studying the relationships between sequences in a multiple alignment to identify possible functional residues. AMAS automatically runs AL-SCRIPT to provide one output that is a boxed, colored, and annotated multiple alignment.
Previous << 1 .. 129 130 131 132 133 134 < 135 > 136 137 138 139 140 141 .. 251 >> Next