Recently in preparation of the RNA-seq slides, I read again the distributions used in the study of RNAseq data. Here are some notes:

**Bernoulli distribution**: If an event's probability of happening is p, (and that of unhappening is 1-p), then it's a Bernoulli test and its distribution is Bernoulli distribution. Pr(X=1)=1-Pr(X=0)=1-q=p
**Binomial distribution**: If the Bernoulli test was repeated multiple times (e.g n), and the number of X=1 occurs, e.g. k, is in Binomial distribution:,

where the (n,k) is the number of different combinations selecting k from n (without considering order of selection, unlike permutation).
**Hypergeometric distribution**: Similar as Binomial test (which is selection without replacement), hypergeometric test is selection k from n with replacement. Its probability mass function is:,

where N is total population size, m is total number of 'happening' events ( or success) in the population, k is number of success in the n selections.
**Negative Binomial distribution**: also called Pascal distribution, is the number of failures (e.g. k) before a specific number of successes (e.g. r) occur. The probability mass function is:
**Poisson distribution**: Let's say you expect something happened 4 times per day, but there is variance (e.g. sometimes it happend 5 times, sometimes it's 2 times or none). So, the probability of the event happened k times on a specific day is:

where lamda is the expected count.

**Relationship between Negative Binomial vs. Poisson distribution:**

In Negative Binomial distribution, say p=successful probability=t/N, so

Pr(X=k)=C(N-1, k) * (t/N)^k * (1-t/N)^(N-k)

=(N-1)*...*(N-k) / k! * (t/N)^k * (1-t/N)^(N-k)

=t^k / k! * __(1-t/N)^N__ * __(N-1)*...*(N-k)/(N^k)__when N-->infinity, the Pr(X=k) converge to t^k / k! * e^-k *1, which is same as Poisson distribution.

## No comments:

## Post a Comment