If a gene is expressed at a level of 1/1200 compared to the average gene, how is probability 50:50 that we have a read mapped to it?

Question

I am reading a book about RNAseq analysis and it says

"To calculate the probability that a read will map to a specific gene, we can assume an average gene size of 4000 nt (100 M nt divided by 25,000 genes). At 30 M reads equivalent to 30× coverage, at single read 100 nt (or paired-end read 50 nt) length, we can expect a single read to map to the average expressed and length gene 4000 nt× 30 coverage/100 nt 1200 times. Thus, if the gene is expressed at a level of 1/1200 compared to the average gene, then we have a 50:50 probability to have a read map to it."

I understand that a single read would map to the average expressed 1200 times but I don't understand how the 50:50 probability is calculated. Please could you explain how this probability is determined.

StupidWolf · Answer

I actually disagree with that.. I guess I write this down as a discussion.
The expected value for an average gene is 1200. For this gene at 1/1200 expression, you expect 1 read.
However, because of sampling, sometimes you get 0, 1 or 2. If we assume the sampling follows a poisson distribution, we can calculate the probability of getting 0, 1 , 2 ... reads:
plot(dpois(0:10,lambda=1),ylab="Probability",xlab="No of reads")

The probabilities are like:
data.frame(n=0:5,p=dpois(0:5,lambda=1))

n           p
1 0 0.367879441
2 1 0.367879441
3 2 0.183939721
4 3 0.061313240
5 4 0.015328310
6 5 0.003065662

The probability of getting exactly 0 or 1 read should be ~0.37. The probability of getting at least 1 read would be 1 - P(n=0) = 1- 0.367879441 = 0.6321206

If a gene is expressed at a level of 1/1200 compared to the average gene, how is probability 50:50 that we have a read mapped to it?

One Answer

Add your own answers!

Ask a Question