Fall, 2019


Exercise 1

There are numerous biological databases that collate, curate, and store information. For this exercise, I would like you to use any information available to you, including Internet sources, to define the following resources, including their physical location and an inditcation of what type of database it is. Due: 09/12/2019

Exercise 2

Complete questions 1-4 on page 64 of your textbook. In addition, given a word size of 3 and an overlap of l-1, draw a potential Euleran graph that represents the circular DNA sequence AGCCTGTCCT Due: 09/17/2019

Exercise 3

Given the graph below, calculate the highest scoring path. Please indicate in your answer the path chosen. Due: 09/17/2019 graph.png

Exercise 4

Given the following sequences: s1 = ACGATC and s2 = ACCAT, calculate the (1) global alignment, (2) semi-global, and (3) local alignment score using match/mis-match of 1/-1, and gap penalty of -2. Include a (4) traceback.

Given the same sequeces, calculate the alignment scores for (5) global, (6) semi-global, and (7) local alignment scores using a match/mis-match of 1/-1, and a gap penalty of 0. Incude a (8) traceback.
Due: 09/24/2019

Additional Material


Sequencing Technologies

Method Read length Accuracy (single read not consensus) Reads per run Time per run Cost per 1 million bases (in US$) Advantages Disadvantages
Single-molecule real-time sequencing (Pacific Biosciences) 30,000 bp (N50); maximum read length >100,000 bases 87% raw-read accuracy 500,000 per Sequel SMRT cell, 10–20 gigabases 30 minutes to 20 hours $0.05–$0.08 Fast. Detects 4mC, 5mC, 6mA. Moderate throughput. Equipment can be very expensive.
Ion semiconductor (Ion Torrent sequencing) up to 600 bp 99.6% up to 80 million 2 hours $1 Less expensive equipment. Fast. Homopolymer errors.
Pyrosequencing (454) 700 bp 99.90% 1 million 24 hours $10 Long read size. Fast. Runs are expensive. Homopolymer errors.
Sequencing by synthesis (Illumina) MiniSeq, NextSeq: 75-300 bp; MiSeq: 50-600 bp; HiSeq 2500: 50-500 bp; HiSeq 3/4000: 50-300 bp; HiSeq X: 300 bp 99.9% (Phred30) MiniSeq/MiSeq: 1-25 Million; NextSeq: 130-00 Million, HiSeq 2500: 300 million - 2 billion, HiSeq 3/4000 2.5 billion, HiSeq X: 3 billion 1 to 11 days, depending upon sequencer and specified read length $0.05 to $0.15 Potential for high sequence yield, depending upon sequencer model and desired application. Equipment can be very expensive. Requires high concentrations of DNA.
Combinatorial probe anchor synthesis (cPAS- BGI/MGI) BGISEQ-50: 35-50bp, MGISEQ 200: 50-200bp, BGISEQ-500, MGISEQ-2000: 50-300bp 99.9% (Phred30) BGISEQ-50: 160M, MGISEQ 200: 300M, BGISEQ-500: 1300M per flow cell, MGISEQ-2000: 375M FCS flow cell, 1500M FCL flow cell per flow cell. 1 to 9 days depending on instrument, read length and number of flow cells run at a time. $0.035- $0.12
Sequencing by ligation (SOLiD sequencing) 50+35 or 50+50 bp 99.90% 1.2 to 1.4 billion 1 to 2 weeks $0.13 Low cost per base. Slower than other methods. Has issues sequencing palindromic sequences.
Nanopore Sequencing Dependent on library prep, not the device, so user chooses read length. (up to 500 kb reported) ~92–97% single read dependent on read length selected by user data streamed in real time. Choose 1 min to 48 hrs $500–999 per Flow Cell, base cost dependent on expt Longest individual reads. Accessible user community. Portable (Palm sized). Lower throughput than other machines, Single read accuracy in 90s.
Chain termination (Sanger sequencing) 400 to 900 bp 99.90% N/A 20 minutes to 3 hours $2,400 Useful for many applications. More expensive and impractical for larger sequencing projects. This method also requires the time consuming step of plasmid cloning or PCR