## Monday, May 07, 2012

### Several distributions: Binomial, Hypergeometric, Negative Binomial, Poisson, Bernoulli

Recently in preparation of the RNA-seq slides, I read again the distributions used in the study of RNAseq data. Here are some notes:
• Bernoulli distribution: If an event's probability of happening is p, (and that of unhappening is 1-p), then it's a Bernoulli test and its distribution is Bernoulli distribution.  Pr(X=1)=1-Pr(X=0)=1-q=p
• Binomial distribution: If the Bernoulli test was repeated multiple times (e.g n), and the number of X=1 occurs, e.g. k, is in Binomial distribution:,
where the (n,k) is the number of different combinations selecting k from n (without considering order of selection, unlike permutation).
• Hypergeometric distribution: Similar as Binomial test (which is selection without replacement), hypergeometric test is selection k from n with replacement. Its probability mass function is:,
where N is total population size, m is total number of 'happening' events ( or success) in the population, k is number of success in the n selections.
• Negative Binomial distribution: also called Pascal distribution, is the number of failures (e.g. k) before a specific number of successes (e.g. r) occur. The probability mass function is:
• Poisson distribution: Let's say you expect something happened 4 times per day, but there is variance (e.g. sometimes it happend 5 times, sometimes it's 2 times or none). So, the probability of the event happened k times on a specific day is:
where lamda is the expected count.

• Relationship between Negative Binomial vs. Poisson distribution:
In Negative Binomial distribution, say p=successful probability=t/N, so
Pr(X=k)=C(N-1, k) * (t/N)^k * (1-t/N)^(N-k)
=(N-1)*...*(N-k) / k! * (t/N)^k * (1-t/N)^(N-k)
=t^k / k! * (1-t/N)^N * (N-1)*...*(N-k)/(N^k)when N-->infinity, the Pr(X=k) converge to  t^k / k! * e^-k *1, which is same as Poisson distribution.