Next Generation Sequencing - A Step-By-Step Guide to DNA Sequencing.
Estimated read time: 1:20
Learn to use AI like a Pro
Get the latest AI workflows to boost your productivity and business performance, delivered weekly by expert consultants. Enjoy step-by-step guides, weekly Q&A sessions, and full access to our AI workflow archive.
Summary
This article explores the evolution of DNA sequencing methods, focusing on the shift from the lengthy Human Genome Project to the rapid Next Generation Sequencing (NGS). It highlights how NGS has revolutionized genome mapping, allowing for sequencing of billions of DNA strands simultaneously. The process involves purification, library preparation, and the use of advanced instruments like Illumina's sequencing by synthesis. NGS is crucial for cancer diagnostics, rare disease detection, and various research fields, with capabilities extending to RNA, cell-free DNA, and more.
Highlights
From taking 32 years for the Human Genome Project to just a day with NGS, DNA sequencing has come a long way! 🌟
Using NGS, billions of DNA strands are sequenced at once, a huge leap from the single-strand sequencing of the past. 🎉
NGS relies on reference genomes, made possible by earlier projects, to map DNA and RNA efficiently. 📜
With applications ranging from cancer research to ecology, NGS proves its worth in various scientific fields. 🧬
Powerful sequencing tools and techniques make NGS a game-changer in genetic research and diagnostics. 🛠️
Key Takeaways
NGS revolutionized DNA sequencing, reducing time from 32 years to just one day! 🚀
The secret behind NGS's speed is its ability to sequence billions of DNA strands simultaneously. 🔬
The Human Genome Project laid the foundation for NGS by creating a reference genome. 📚
NGS is versatile, sequencing DNA, RNA, and even cell-free DNA across diverse fields. 🌍
Advanced tools like Illumina’s sequencing by synthesis make NGS highly efficient. 🤖
Overview
DNA sequencing has undergone a significant transformation with the advent of Next Generation Sequencing (NGS). Compared to the Human Genome Project, which took over three decades to complete, NGS has drastically reduced the time required to sequence an entire human genome to just one day. This remarkable advancement is primarily due to the capability of sequencing billions of DNA strands simultaneously, a feat not possible with older technologies like Sanger sequencing.
The introduction of NGS has opened up a plethora of possibilities in the world of genomics. It works by cutting DNA into smaller, manageable pieces which are sequenced and then assembled using a reference genome, a crucial component developed by projects like the Human Genome Project. This method not only speeds up the process but also expands the scope of what can be sequenced, including RNA and cell-free DNA, thereby revolutionizing diagnostics and research in fields such as cancer treatment, rare diseases, and environmental studies.
Advanced instruments, particularly from Illumina, facilitate NGS through a method known as sequencing by synthesis. This involves intricate processes like library preparation, clonal amplification, and real-time sequencing which allow for extensive and precise genetic mapping. NGS has proven invaluable across various scientific areas, demonstrating its versatility and efficiency in providing a deeper understanding of genetic materials and their applications in modern science.
Chapters
00:00 - 00:30: Introduction and Background The chapter titled "Introduction and Background" discusses the Human Genome Project. Initially, only 85 percent of the human genome was sequenced between 1990 and 2003. It took 32 years to completely sequence the human genome, which finally concluded with the completion of the remaining gaps in 2022. Now, next generation sequencing technologies have drastically reduced the time required for sequencing the human genome.
00:30 - 01:00: Sanger Sequencing and Comparison with NGS The chapter discusses the advances in DNA sequencing technology, comparing the traditional Sanger sequencing method with the modern Next-Generation Sequencing (NGS). It highlights the dramatic improvement in speed, where NGS allows billions of DNA strands to be sequenced simultaneously, drastically reducing the time required to sequence a person's entire genome from 32 years to just one day. Furthermore, it explains the importance of the Human Genome Project in enabling NGS by providing a human reference DNA sequence, noting that only Sanger sequencing was available during the project's execution.
01:00 - 01:30: Basic Principle and Sample Preparation for NGS The basic principle behind Next-Generation Sequencing (NGS) is that DNA can be cut into small pieces and sequenced, which are then assembled into a complete sequence based on a reference genome. NGS can be used to sequence both DNA and RNA. Initially, samples are collected, followed by the purification of DNA or RNA.
01:30 - 02:00: Library Preparation and Sequencing Instruments The chapter discusses the preparation of libraries and sequencing instruments, focusing on the steps involved in preparing RNA for sequencing. Initially, RNA is reversed-transcribed into DNA to facilitate sequencing. Following this, a library is prepared from the DNA by cutting it into short fragments using either high-frequency sound waves or enzymes. Adapters are then attached to each end of these DNA fragments to complete the library preparation process.
02:00 - 02:30: Sequencing by Synthesis Process The chapter explains the process of sequencing by synthesis, highlighting the role of adapters in the sequencing process. These adapters contain essential information for sequencing and include an index for sample identification. Non-bound adapters are removed to complete the library. Depending on the application, a PCR step may be included to increase the library amount. A successful library is characterized by the correct size and a high enough concentration for sequencing. The chapter also mentions that the main sequencing instruments used in Next-Generation Sequencing (NGS) are manufactured by Illumina, which employs this method.
02:30 - 03:00: Cluster Amplification and Sequencing Primer Binding This chapter explains the process of sequencing by synthesis, where DNA sequencing occurs on a glass surface of a flow cell. It describes how short DNA pieces, known as oligonucleotides, are bound to the flow cell surface and match the adapter sequences of the library. The library is first denatured to form single DNA strands, then added to the flow cell to attach to one of the two oligos.
03:00 - 03:30: Fluorescent Nucleotides and Read Cycles The chapter titled 'Fluorescent Nucleotides and Read Cycles' covers the process of preparing DNA strands for sequencing. Initially, the forward strand attaches to an oligo, and then the reverse strand is synthesized. The forward strand is removed, leaving the DNA library bound to a flow cell. To ensure detectability, the fluorescent signal of these DNA fragments is amplified, specifically through a process of clonal amplification via PCR. This amplification occurs at a constant temperature, and involves annealing, extension, and melting by altering the flow cell solution.
03:30 - 04:00: Index Sequencing and Filtering Bad Reads The chapter titled 'Index Sequencing and Filtering Bad Reads' explains the initial steps in sequencing where strands bind to a second oligo on the flow cell creating a bridge. This is followed by copying and denaturing of these strands to form double-stranded fragments. The process is repeated, forming localized clusters. Eventually, the reverse strands are cut and washed away, leaving the forward strand ready for sequencing. A sequencing primer then binds to these forward strands, setting the stage for sequencing using fluorescent nucleotides G, C, T, and A.
04:00 - 04:30: Demultiplexing and Mapping Reads to Reference Genome This chapter focuses on the process of sequencing using a flow cell and it involves DNA polymerase and nucleotides with fluorescent tags. In this method, each nucleotide is tagged with a distinct fluorescent color and a terminator, ensuring that only one nucleotide is sequenced at a time. Initially, a complementary base pairs with the sequence, and the color specific to that cluster is recorded by a camera. Following this, a new solution is introduced to remove the terminators, allowing nucleotides and DNA polymerase to flow again, continuing the sequencing process.
04:30 - 05:00: Paired-end Sequencing and Read Depth This chapter explains the process of paired-end sequencing, focusing on the read cycles, sequencing of indexes, and handling of the reverse strand. It highlights the steps involved in sequencing the first and second indexes, and how a bridge is created for the second oligo in the absence of a primer. The sequencing process is described as cyclical, with reads being washed away after being sequenced unless they are part of a paired-end strategy, where further sequencing takes place. This exploration of sequencing techniques elucidates the sequential processes and considerations involved in detailed DNA analysis.
05:00 - 05:30: Coverage and Applications of NGS The chapter titled 'Coverage and Applications of NGS' delves into the sequencing process. Initially, unique dual indices are applied to facilitate the sequencing of potentially 384 samples in a single flow cell. As the sequencing progresses, the forward strands are synthesized and subsequently removed, allowing for the reverse strands to be sequenced. Post sequencing, any subpar reads such as overlapping clusters, those leading or lagging in sequencing, or exhibiting low intensity, are filtered out.
05:30 - 06:00: Types of Sequencing and Additional Applications The chapter discusses the process involved in sequencing, starting with filtering out certain fragments from nanowells. It mentions polyclonal wells being filtered and explains the demultiplexing step where attached indexes are used to identify and sort reads from each sample. Finally, the chapter explains how reads are mapped to a reference genome, with various reads aligning and overlapping with each other on the genome.
Next Generation Sequencing - A Step-By-Step Guide to DNA Sequencing. Transcription
00:00 - 00:30 ClevaLab. The Human Genome Project uncovered
all 3.2 billion bases of the human genome. This project started in 1990 and took until 2003
to complete 85 percent of the first genome. But, in 2022, the gaps got filled and the sequence
became complete. So in total, sequencing the human genome took 32 years. Now, with Next
Generation sequencing or NGS, it takes only
00:30 - 01:00 a day to sequence a person's entire genome. One
day is a dramatic speed increase compared to 32 years! The difference is due to the number of DNA
strands sequenced at once. Billions of DNA strands get sequenced simultaneously using NGS. However,
only Sanger sequencing was available for the Human Genome Project. With Sanger Sequencing, only
one strand can get sequenced at a time. However, NGS only works because the Human Genome Project
created a human reference DNA sequence. The
01:00 - 01:30 basic principle behind NGS is that DNA can be cut
into small pieces and sequenced. The sequences of these small pieces then get assembled based on the
reference genome. NGS can be used to sequence both DNA and RNA. First, samples get collected, and
the DNA or RNA gets purified. Next, the DNA or RNA
01:30 - 02:00 gets checked to ensure it's pure and undergraded.
RNA first needs to be reversed-transcribed into DNA before it can get sequenced. A library
then gets prepared from the DNA. A library is a collection of short DNA fragments from a long
stretch of DNA. Libraries get made by cutting the DNA into short pieces of a specified size. This
cutting gets done by using high frequency sound waves or enzymes. Then sequences of DNA called
adapters get added to each end of a DNA fragment.
02:00 - 02:30 These adapters contain the information needed for
sequencing. They also include an index to identify the sample. Finally, any non-bound adapters get
removed, and the library is complete. Depending on the application, there can be a PCR step to
increase the library amount. A successful library will be of the correct size. It will also be of
a high enough concentration for sequencing. The main sequencing instruments used in NGS are from
Illumina. These instruments use a method called
02:30 - 03:00 sequencing by synthesis. The sequencing occurs
on a glass surface of a flow cell. Short pieces of DNA, called oligonucleotides, are bound to the
surface of the flow cell. These oligonucleotides match the adapter sequences of the library. First,
the library gets denatured to form single DNA strands. Then this Library gets added to the flow
cell, which attaches to one of the two aligos. The
03:00 - 03:30 strand that attaches to the oligo is the forward
strand. Next, the reverse strand gets made, and the forward strand gets washed away. The library
is now bound to the flow cell. If sequencing started now the fluorescent signal would be too
low for detection. So each unique library fragment needs to get amplified to form clusters. This
clonal amplification is by a PCR that happens at a single temperature. Annealing, extension and
melting occur by changing the flow cell solution.
03:30 - 04:00 First, the strands bind to the second oligo on
the flow cell to form a bridge. The strands get copied. Then these double-stranded fragments
get denatured. This copying and denaturing repeats over and over. Localized clusters get
made, and finally, the reverse strands get cut. These strands get washed away, leaving the
forward strand ready for sequencing. The sequencing primer binds to the forward strands.
Next, fluorescent nucleotides G, C, T and A get
04:00 - 04:30 added to the flow cell along with DNA polymerase.
Each nucleotide has a different color fluorescent tag and a terminator. So only one nucleotide can
get sequenced at a time. First, the complementary base binds to the sequence. Then the camera reads
and records the color of each cluster. Next, a new solution flows in and removes the terminators.
The nucleotides and DNA polymerase flowing again,
04:30 - 05:00 and another nucleotide gets sequenced. These read
cycles continue for the number of reads set on the sequencer. Once complete, these read sequences get
washed away. Then the first index gets sequenced, and washed away. If only a single read is needed,
the sequencing ends here. But, for paired-end sequencing, the second index is sequenced, as
well as the reverse strand of the library. There is no primer for the second index read. Instead, a
bridge gets created so that the second oligo acts
05:00 - 05:30 as the primer. The second index is then sequenced.
These two index reads use unique dual indices. These allow the use of up to 384 samples in the
same flow cell. Next, the reverse strand gets made, and the forward strands are cut and washed
away. The reverse strands are then sequenced. Once the sequencing is complete, any bad reads
get filtered out. These include the clusters that overlap, lead or lag with sequencing or are of low
intensity. The clusters cannot overlap on a patent
05:30 - 06:00 flow cell, but there can be more than one library
fragment per nanowell. These polyclonal wells will also get filtered out. Next, the reads passing the
filter get demultiplexed. Demultiplexing uses the attached indexes to identify and sort reads from
each sample. Finally, the reads get mapped to the reference genome. The different reads align to
the reference genome, overlapping each other.
06:00 - 06:30 Paired-end sequencing creates two sequencing reads
from the same library fragment. During sequence alignment, the alogarithm knows that these reads
belong together. Longer stretches of DNA or RNA can get analyzed with greater confidence that the
alignment is correct. Read depth is an essential metric in sequencing. Read depth is the number
of reads for a nucleotide. Average read depth is the average depth across the region sequenced. For
whole genome sequencing, a 30x average read depth
06:30 - 07:00 is good. A 1500x average read depth is suitable
for detecting rare mutation events in cancer. Another essential metric is coverage. The aim is
to have no missing areas across the target DNA. NGS gets used in a wide variety of applications.
In diagnosing cancer and rare disease, treatment guidance for cancers, and many research areas from
ecology to botany to medical science. Both DNA and
07:00 - 07:30 RNA can be sequenced. It could be the whole genome
or transcriptome, just the coding regions (called exomes) of the DNA, or target genes in the DNA or
RNA. All types of RNA can be sequenced including non-coding RNAs such as microRNAs and long
non-coding RNA. In addition, cell-free DNA, single cells, as well as methylation or
protein binding sites can also get sequenced.