YTread Logo
YTread Logo

The Beginner's Guide to RNA-Seq - #ResearchersAtWork Webinar Series

Jun 07, 2021
Hello everyone, thank you all for your patience and we will start the

webinar

now. Good morning and thank you for taking the time to join our

webinar

today. Today's topic will provide an introduction to RNA searching for those of you who. New to our webinars, they are designed to provide ongoing, systematic training that will help you stay informed about our products and services. After the webinar, we will share a copy of the PowerPoint slides as well as the recorded webinar. I would like to point out that if you are signed in to a Google account, you can ask any questions you may have in the chat box on the right of the screen.
the beginner s guide to rna seq   researchersatwork webinar series
Please note that if you haven't registered before, there is a registration link in the video description. will make sure you get a copy of the slides and answers to any questions, post them in the chat during the webinar to help frame the outline of this webinar. These are the topics we will cover first and start with some background. gene expression followed by an introduction to NGS, then we will discuss some of the important considerations when designing your RNA search experiment, followed by understanding workflow analysis and interpreting your experiment, finally we will take some time to review some projects that have demonstrated the power and versatility of RNA search before we started talking about RNA search.
the beginner s guide to rna seq   researchersatwork webinar series

More Interesting Facts About,

the beginner s guide to rna seq researchersatwork webinar series...

I'd like to take a moment to introduce the speakers joining me today: My colleague, Dr. Chris Christopher Mention, Science Applications Specialist here at ABM, after completing his PhD at UBC, where he studied stem cell regulation , joined ABM with a passion for helping scientists achieve their goals with nearly 10 years of experience in research and experimental design with researchers around the world. and in areas ranging from cell and developmental biology to CRISPR validation and cell line engineering, it can help with almost any product with M, almost any project you have, from initial setup to post-sequencing data analysis. Next to him, I have myself, a product development specialist.
the beginner s guide to rna seq   researchersatwork webinar series
With over five years of research experience in areas such as drug delivery, lipid biosynthesis, and gene therapy gained throughout my graduate degree at SFU, our primary goal is to work closely with clients and provide them with the support for which ABM has been recognized before us. Start with the main content of the webinar. I'd like to take a moment to talk about applied biological materials and what our goals are. ABM was founded in 2004 and has been driven to catalyze scientific discoveries in the field of life sciences and drug development for For the past 15 years, headquartered in Vancouver, Canada, we are one of the fastest growing biotechnology companies. growth in the region and, since our inception, we have worked hard to be known as a trusted source for customers like you.
the beginner s guide to rna seq   researchersatwork webinar series
This hard work has allowed us to expand our facilities. Starting with a branch in the Changsu province of China in 2013 and a new facility in Bellingham opening later in 2019, these expansions have put us in a position to work with each and every one of you and provide you with world-class service. world. You all deserve our team of passionate and skilled scientists. IBM is dedicated to empowering researchers with the latest innovations for all their scientific needs. Now, before we discuss the core details of this webinar, I think it is important to discuss some of the work that led us to The Search for RNA and it is important to begin research in general when studying the role of genes in the development of diseases.
There are three different levels that we can use for the exam. We can study them at the DNA level by studying mutations to determine their effects on genes. At the RNA level studying gene expression and at the protein level examining the folding patterns of genes, the first link in the chain that leads to the search for RNA is the north block developed in 1977 by James Alwyn David Kemp and George Stark in the Stanford University this tool was extremely useful and that allowed the study of gene expression through the detection of RNA after Northern blotting was the rtq PCR that was developed in the early 80s by Kary Mullis and this technique allows the detection, characterization and quantification of RNA transcripts, finally the last step before searching for RNA was the microarray developed by Patrick Brown at Stanford University.
This tool is impressive and can then be used to simultaneously study the expression levels of thousands of genes at once and with that we finally arrive at the current RNA search that allows revealing the presence and quantity of RNA in a biological sample at a moment in time. given, as well as observing changes in gene expression over time, now by considering these three methods above we can see that while they all had strengths such as low reagent cost and the ability to be easily performed in the comfort of your own laboratory. There are some elements that we must consider.
A key note is the fact that each system requires prior knowledge of the strand or mRNA to be used, making this impossible. use these methods to study new genes with RNA search, on the other hand, the advantages are considerable, since the initial cost of RNA search is high relative to previous methods, the use of single nucleotide resolution together with The ability to sequence new transcriptomes without prior data is a huge advantage when combined with its enormous throughput, it creates a winning combination that would be an advantage for any researcher. Now, with the arrival of Illumina in 2007, we can see an almost exponential increase in the number of RNA search publications, demonstrating its widespread use. and the accessibility of this tool for the researcher and we are now in a position to answer the question what is RNA searching in essence.
RNA searching is a technique that allows us to start with cells or tissues and examine the expressed genes by taking advantage of the following technologies. By generational sequencing we can learn about changes in gene expression and identify novel genes and splicing events with this brief background. I'd like to hand this over to Christopher, who will begin with a brief overview of the topics in today's talk, but first a quick reminder. To everyone listening, please post any questions you may have in the chat box during the course of the webinar. Thank you boshy for that brief introduction and thank you for joining us today for our webinar.
I'm going to briefly go into an introduction to next-generation sequencing to give you a better idea of ​​how the technology works before going into more detail about finding RNA, so that next-generation sequencing can be used for a variety of purposes. , including whole genome sequencing, studying changes in gene expression by searching for RNA, or performing metagenomic studies with environmental samples. The basic workflow for all next-generation sequencing is the same: input material, whether a DNA or RNA fragment, is taken to be of a similar size. Sequencing adapters are ligated which can then be attached to the sequencer before being subjected to high throughput. sequencing now there are a couple of important terms to know for all next generation sequencing approaches, the first is read, a read is a sequence of nucleotides that will be sequenced, so on the right you can see there is a double stranded DNA molecule and when it is sequenced, it will be sequenced along one strand, giving you the sequencing readout of the nucleotides that are present.
We will refer to this as simple sequencing and since you are only synthesizing one strand, in this case the sense strand; Alternatively, you can also use pairs. Finish sequencing to read a fragment of both strands of the molecule When you are trying to decide which option to choose for your project, it is important to consider what the actual goal of your project is, for example, single-end sequencing is usually sufficient to study the changes in gene expression, while paired-end sequencing is more useful for whole genome sequencing, alternative splicing study, or de novo transcriptome studies. Then the length of the reading may vary.
Read length is considered to be the number of nucleotides that are sequenced per read to give you an idea of ​​the length of the read. Typical sizes for RNA look at 75 nucleotides is a common read length that would be excellent for studying gene expression or resequencing samples. 150 nucleotides is a longer read length that is more suitable for assembling new transcriptomes or whole genome sequencing for eukaryotes and even longer reads, such as 300. Nucleotides are more suitable for searching for amplicons as well as for metagenomic studies. Then, once you've read the sequence, you still need to figure out how it lines up with a reference sequence, so in this example you can see that there is a reference sequence in dark blue at the bottom. of the image and then reading and the light blue being mapped to a specific location in that reference sequence.
If we were to look at a specific nucleotide like this G here highlighted with a red box, you can see that there are two reads in light blue. which maps to this eight and the reference sequence because there are two reads that cover this, we describe it as 2 X coverage because it covers that nucleotide with two different reads. If we look at other regions like this C residue, you can see there are four. read that map at this specific location, this would give you 4x coverage. There are also often sites that don't have any reads that have mapped them at all, like this one, a residual and for this site you could describe it as having zero x coverage.
Now when you take all of these sites and add up the coverage levels for each nucleotide you can get an idea of ​​the coverage for that site, so in this example there are six reads for these three bases Ga and C when you divide it by the amount of nucleotides you would get would be 2x coverage, so you could say that the sequencing depth is 6 reads with an average of 2x coverage for these nucleotides. Now, for most samples, you would typically sequence millions of reads to make sure that most of the transcription time is doubled. Larger genomes or transcription rooms that require more reads are typically sequenced, so for example bacteria that have a genome size of about 5 million base pairs would require fewer reads and less sequencing than mammals. that would have a genome of 3 billion base pairs or plants that would have a genome of 3 billion base pairs.
We even have much larger genome sizes in terms of searching for RNA for bacteria, this would look like 8 million reads per sample versus 20 million reads or 40 million reads per sample for mammals and plants. Below I'll go over a couple of important things to consider when first designing your rna-seq experiment, most of the RNA that is present in a cell is not actually messenger RNA, which is what most researchers want to sequence. For example, if you take the total RNA present and look at the breakdown of what it's made up of. We'll find that about 85 percent is ribosomal RNA, the sequence of which most researchers typically don't want;
Then, 10 to 12 percent is transfer RNA, which most researchers are not interested in sequencing for their projects. The mRNA itself is usually around 2 to 3 percent. of the total RNA present in a cell and then an even smaller percentage than this is made up of long circular non-coding RNAs from micro RNAs and other sequences, so if your starting point is this large population of total RNA, you need to find a way to enrich for what you really want a sequence in that sample, there are two basic ways we can do this, the first of which is a poly enrichment where we use special beads that can bind to the poly a tails in mrnh and sequences. commands to extract them from the total RNA population and enriched for poly a sequences.
Alternatively, if you want to study mRNAs and small RNAs like microRNAs, you can use a treatment called our RNA depletion which will selectively remove RNA sequences from the total RNA population if you are working with eukaryotic samples, you can use any of the enrichment options. poly a or RNA depletion if you are working with prokaryotic samples; However, it is absolutely necessary to use an RNA depletion treatment because prokaryotic cells generally do not have tails.poly a in their mRNA transcripts. Next, you need to consider how you are designing your project and what your goal is. If your goal is to assemble the Nauvoo transcriptome for a species that has not previously had a sequence transcriptome, you typically want greater sequencing depth and longer reads to assist with assembly.
Gene expression To see if a particular gene increases or decreases in response to a stimulus, the use of single-end sequencing is usually sufficient. Finally, if you want to identify novel transcripts or new alternative splicing events that you would normally want. To utilize greater depth of sequencing and paired-end sequencing, below I will briefly review the general RNA search workflow for most projects. This is highlighted here and I don't want you to focus too much on this, but there are four basic steps to get started. With library preparation before inputting a sample into the sequencer, followed by bridging PCR sequencing by synthesis, and finally analysis at the beginning of the project, you would have to input your starting material, assess its quality, and convert it to DNA, typically When we start with a sample we would ask is if the mRNA is degraded.
If it is not degraded, we can perform polyselection before converting the RNA to DNA. This conversion of RNA to DNA is performed to increase this capacity of the molecule and ensure sequencing success if the mRNA transcription is degraded. however, we can do special treatments that can repair the transcription to continue providing the sequence below the material for the project. Once the material has been converted into DNA, it will probably be of different sizes because different mRNA transcripts generally have different lengths, so the next step would be to fragment the DNA into uniform sizes to ensure that each fragment has the same length. probability of being sequenced.
The next step is to ligate sequencing adapters so that the DNA fragments can be attached to the sequencer. Once this is done, you can feed the material into the sequencer and you will go through a step called cluster generation or bridge PCR now, sorry, I'm going to back up briefly. Bridging PCR is one of the most important steps for sequencing because when sequencing these molecules it is not possible to sequence a single DNA molecule in the sequencer. they generate groups of identical molecules that are then sequenced together, so in this diagram from the first part you can see a DNA molecule that binds to the flow cell of the sequencer, the molecule then bends and binds to the other adapter in the flow cell before DNA synthesis. begins to form a double-stranded DNA molecule.
You can see this now in part 3, where the sequencing reaction that generated the double-stranded DNA occurs. In step four, you can see that you now have two molecules in two groups. This process is repeated many times in panel five until you have enough DNA molecules to be able to sequence each group. Now it is important to know the concentration of your library so you can avoid overclustering. Overclustering occurs when the cluster density is too high, causing the machine to fail. able to accurately read each group to determine what the DNA sequences are and generally causes the entire sequencing to fail.
The opposite problem you also want to avoid is insufficient grouping. This happens when the cluster density is too low and leads to lower overall sequencing. output and again makes it difficult for the sequencer to read what the sequences are in that group. Below I will briefly go over the aluminous technology for sequencing by synthesis to give you an idea of ​​how the sequencing process itself actually works, when you have the template DNA. It will have a primer that will prepare it for DNA synthesis that binds to the flow cell. You then have individual nucleotides that have fluorescent dyes attached to them that can be used during the synthesis reaction.
In the next panel you can see that the nucleotide has If added to the sequence, all the nucleotides that did not join because they do not have a complementary base on the other strand will be removed after the step occurs, there will be fluorescent emission and a response to the light stimulus of the nucleotide that was added. that the sequencer will take images after this step, the fluorescent editor, the fluorescent site is excised from this molecule and then the process is repeated again once this is repeated until the end of the sequencing, then you will be able to make the machine go through and read each of the fluorescent signals that were produced to reconstruct what the sequencing was for that molecule at each stage of sequencing, although quality controls are essential and there are three main technologies that we use to perform this quality control, the first of which is a qubit that can measure the concentration of DNA. the Agilent bioanalyzer, which can evaluate your DNA library and how well it has been fragmented, and then the qpcr, which can be used to determine which part of the actually prepared library is the sequence of all, so you can start one of the things most important things you can do. is to run an RN angel to determine if your material is possibly degraded;
If you have a high-quality sample, you should see a few distinct bands in your gel; If the sample is degraded, you will usually see a large spot in that lane if you have this degradation, we can use special kits to recover the degraded RNA or to reverse the formaldehyde modification on the RNA if your sample was repaired before extraction and this can help provide you with a low-material sequence for the RNA. Look below if you have a low amount of starting material. material, the first thing you need to do is know what this amount is and for that we can use cubit to measure the concentration of nucleic acids that are present in a sample, this is crucial for library preparation if you have a low amount of initial. material we can use special kits that can amplify the mRNA that is present using the poly a tail before looking for the RNA, after having the material that has been converted from RNA to DNA and going through the fragmentation step, the Agilent bioanalyzer is useful to tell you how efficient the fragmentation was and give you an idea of ​​the average fragment size in your sample and then once you've liked it on the adapters, qPCR becomes essential and I will follow these two steps below with Agilent when you enter your sample , the data you get from it is effectively a graph telling you how large the fragment sizes are present in your sample and their overall distribution, so on this graph here you can see that there are two peaks to the left and On the right side of the chart, I've highlighted them here in red boxes for you, what you really want to see is the fragment size highlighted in this green box, which roughly tells you that your fragments are at the upper end of the size range of fragments, which is a good result you want to get for sequencing.
The alternative is when you have an uneven distribution of chunk sizes with no large chunks, and typically these smaller chunks are harder to see and less likely to give you a high result. quality sequencing result or cause sequencing failure, so we generally want to avoid this next with qPCR, this gives you a useful metric that you can't measure with qubit. Now remember that qubit can tell you about the amount of nucleic acids present in a sample. but qPCR can tell you how many nucleic acids are actually sequenced and have adapters as you age them correctly below.
I'll go over RNA-seq analysis and data interpretation a little more for the overall workflow next time. -Generation sequencing is complete is that it will go from raw data to a format called fast queue with intermittent steps before you can perform data analysis, the raw data from the sequencer will usually never be seen by you because the software or sequencer will process them. in the fast queue now the fast queue is simply the FASTA format, a text-based format for representing nucleotides with the quality control information of that sequencing run in that particular sequencing, read if you want to see what this real data would look like of the express queue shown here. to the right and you can see that it's largely a line of alphanumeric text, but it's a little hard to understand what's going on.
The important sequence information here is highlighted in the red box, which is the actual sequencing result of this particular sample from this particular group below. You have to figure out how to use the real data before you can do the analysis you're interested in, so with fast queue this is generally what we provide for every next generation sequencing project, whether it's whole genome RNA searching or something else. . sequencing, if I wanted to work with this, I would first have to take data from the fast queue and align it with the reference sequence. We briefly reviewed the alignment of readings earlier in the presentation.
This is one of the crucial steps at the end of sequencing. After you have done the alignment, which can usually be done using different software, you will need to normalize your data before you can begin the analysis and generate beautiful figures that you can eventually publish with your manuscript. I'm going to go over a little bit about how data is normalized, it is important to normalize your data for two main reasons, but first you need to normalize the number of readings per sample because some samples may receive different readings. Next, you need to normalize the number of reads per gene because the genes have different lengths. and you must take this into account.
I'll go over this a little more in the next few slides, for example, if you have three samples, one B and one C, you'll generally want each sample to be sequenced the same amount, but because of stochastics. differences in sequencing that cannot be controlled, you generally have different sequencing results for each sample in this example sample A would have approximately 20 million reads, sample B approximately 30 million reads and sample C would have 10 million reads If you didn't normalize your data without doing any processing, you might think that sample C has one-third the gene expression of sample B, for example, or that the expression levels of sample A and sample B are different, even if they may be identical;
Secondly, you need to normalize relative to the length of the gene and the number of reads for that gene. In this example, you can see that gene A is about one KB and gene B is 2 KB or twice that length for This example, gene A has 5 reads that map to it and gene B has 10 reads, but gene B is twice as long, if you didn't normalize your data you would think that gene B is expressed twice as much, even if That wasn't the case, then you would have to normalize the data. There is a useful metric we can use. that can explain the differences between samples and between genes.
This is called fragments per kilobase of transcript per million mapped reads or fpkm. This is a way of looking at the relative expression of a transcript proportional to the number of cDNA fragments from which it originated or sorry. which originated from it, so fpkm is essentially normalized gene expression estimation based on RNA sequencing data or another way to think about this is that fpkm helps normalize for differences in the number of reads per sample , as well as the number of readings proportional to the duration. A given gene is in this specific example, once we have normalized gene A and gene B, you can see that the fpkm value is 2 for both, indicating that both genes, despite having different lengths and different numbers of mapped reads, have a relatively similar expression level.
If you were to compare this to a treated normalized sample, you could include whether expression has increased or decreased relative to the control using this normalized data set. Once you have this, what kind of analysis can you actually do? If you can use heatmaps to look at large changes in expression between a control sample and an example tree and to visually detect increases or decreases in gene expression, you can then perform principal component analysis to see if any of your samples have different changes in gene expression that may be correlated and related to the treatment conditions they underwent next.
It can also perform functional annotation of differentially expressed genes. This allows you to look at genes that have increased decreased expression or remain the same and see if there are any similar pathways or processes that they may be involved in, such as inflammation metabolism or even a medium response, next I'll give it back to boshy to to review a couple of examples from researchers who have used thesearch for RNA in your studies, thanks for that. great talk Christopher, as you just mentioned, I'm going to take a few minutes to look at some of the studies that people have used to demonstrate the power and versatility of our NAC and then we'll wrap up the webinar and move on to the Q&A session.
Now the first of these projects is the coding project that focused on identifying genome-wide regulatory regions in different cell lines. It was then followed by the model organism coding project which aimed to provide the biological research community with a comprehensive encyclopedia of functional genomic elements. in the model organism C elegans and D Milano gaster, this was a huge task and was intended to help better understand the downstream effects of regulatory regions. Then we have the Cancer Genome Atlas Project, which leveraged RNA to analyze thousands of samples from cancer patients. To better understand the underlying mechanisms of malignant and cancer transformation and progression, finally, the use of RNA searching in medicine has been substantial, allowing researchers to expand their work toward personalized medicine, which may have a great impact on genetic diseases and with that now Come to the final part of the webinar and as a thank you, we would like to offer you your promo code to get 25% off your RNA sequencing bioinformatics package, which you can see here and it will also be included in our follow-up email. and now that we're done with the background on RNA seek, let's take a moment to examine some of the resources that ABM offers, first be sure to visit our website where all of our resources are collected, this includes our knowledge base articles as well like our YouTube videos and blog posts to help give you the tools you need for your experiments.
Now we not only have a diverse selection of educational materials, but we also have an incredibly knowledgeable technical support team that can

guide

you through your experiments, as well as a dedicated customer service team to ensure you receive your items in a timely manner. . If you have any questions about our materials or services, you can always contact us by phone, email, online chat, and we also have a complete FAQ section on our website so you can always browse and see if you can find the answers there. Thanks for taking the time to join us today, just let us know.
We will have another webinar on whole genome sequencing, so stay tuned. Look out for an invitation from us soon. Thank you for taking the time to join us today. Now we're going to move on to the question and answer section, so take a moment, give us a moment, please, just to go over all the questions. We'll start with as many as we can so we have one here from Ian oh and the question is how long does it take to do the library prep as well as the quality control and sequencing so Chris overall from the moment we we receive your samples for preliminary quality control, library preparation, Agilent's quality control process and prepare your samples for sequencing the process from the date we receive your samples until you obtain your data is approximately four to six weeks, now a big chunk of that time.
It is not reserved for the actual sequencing process, which can usually be done in about a week, but is to ensure we have enough samples to set up sequencing. If you have a large number of samples, we possibly can provide you. the results of your sequencing in a matter of a few weeks, that seems like a good answer, so now let's find another one. We have one here from Andrew Cushman. He asks if he has done his experiment and has the raw data. Can he give us the raw data and ask us to do the sequencing?
He, thanks Andrew, that's a great question, so if you have raw data and don't know what to do with it, we have a dedicated in-house bioinformatics team that can effectively perform almost any type of analysis you need for your project . We have a number of bioinformatics services listed on each of our NGS pages, including RNA. Look to see if there's something you're interested in that we didn't list. If you simply send us an email, we can work with our bioinformatics team to set up a custom analysis package for you, and we will typically have results back to you within a matter of a few weeks, depending on how challenging the analysis is and how much Custom software we have to develop for you.
That's perfect. Now we have another one here from Kate, what happens if my samples fail QC again Kris? Thanks Kate for that great question. If we receive your samples and there are any problems during the quality control process, we will contact you and let you know if we would like to order new samples. or if there are steps we can take to try to address it and process your samples and possibly still achieve high-quality sequencing results, if for any reason your samples do not pass additional quality control, we will contact you again and ask if we will be able to provide new samples or provide alternative options for proceeding.
That was a great answer Chris. We're reviewing the questions, so we have one more. I think we only have time for one. More questions, so we have one here and why Ann is asking if she can send one sample and use it for both the RNA search and the MI RNA search or if they need to send separate samples for each. It's actually quite a popular question, so when we try to process microRNA searching samples we need to use special kits that are exclusive for small RNA sequences, this would be different than what we would normally do for normal RNA searching, so if you wanted to look at both rna-seq and micro RNA, we would order double the initial sample amount or two separate samples to process this.
It was fantastic and I hope it was helpful and made sense. Let me see if we have We have if we can answer anything else, so unfortunately I think we're a little bit short on time right now, so what we'll do is go over the questions, we'll write some answers and I'll email it to you along with the slides of this webinar, so once again, thank you all for joining us today and we hope you have a great rest of your day, thank you.

If you have any copyright issue, please Contact