## Tuesday, November 20, 2012

### T-test vs. Wilcox-test, MA-plot vs. volcano plot

Rafa lab has made a very nice serial of videos on The Statistics of Genomics. Here is the one talking about useful plots in genomics, esp. for next generation sequencing.

Among the many interesting tips, one is to replace the MA plot with volcano plot to better demonstrate the differential expressed genes.

Here is a description of volcano plot from NIH site:
However one chooses to compute the significance values (p-values) of the genes, it is interesting to compare the size of the fold change to the statistical significance level. The ‘volcano plot’ arrange genes along dimensions of biological and statistical significance. The first (horizontal) dimension is the fold change between the two groups (on a log scale, so that up and down regulation appear symmetric), and the second (vertical) axis represents the p-value for a t-test of differences between samples (most conveniently on a negative log scale – so smaller p-values appear higher up). The first axis indicates biological impact of the change; the second indicates the statistical evidence, or reliability of the change. The researcher can then make judgements about the most promising candidates for follow-up studies, by trading off both these criteria by eye.
It mentioned using t-test to get a p-value for each gene, to see whether the means of two groups are statistically different from each other.

What I was confused is: What's the difference between t-test and Wilcox test? Shamed on my poor knowledge on statistics, I was reading a bit on this. And here is what I got from Vacide Avsar et al.'s paper:

Student’s t-Test is any statistical hypothesis test in which the test statistic has a Student’s t distribution if the null hypothesis is true.
Different hypothesis tests make different assumptions about the distribution of the random sample in the data. One of the assumptions for the t-test is that the data are independently sampled from a normally distributed population. This assumption about the population distribution makes the t-test be a parametric statistical test. In some cases, the data within two correlated samples may fail to meet this assumption. When this happens, an appropriate non-parametric alternative test can be found. One of these non-parametric alternative tests is called the Wilcoxon Signed-Rank Test.
Like the t-test, Wilcoxon test involves comparison of the differences between measurements. On the other hand, it does not require assumptions about the form of the distribution of the measurements. It should therefore be used whenever the distributional assumptions that underlie t-test cannot be satisﬁed.
So, t-test is a parametric statistical test and Wilcoxon is a non-parametric statistical test. And t-test assumes the data were independently sampled from a normal distribution while Wilcox test does not.

At the end of the paper, the authors concluded that t-test has slightly better power than wilcox test.