A common problem with using proportions as response variables is that the distribution of proportions may “pile-up” against 0 or 1, resulting in skewed data (i.e. non-normal distributions). Additionally this may lead to unequal variances. We are often interested in both mean and variance of a proportion, however calculating confidence intervals from skewed proportion data may generate confidence intervals that fall outside the logically relevant limits of [0,1]. If we are interested in using an ANOVA framework for data analysis one solution is to use the arcsine square root transformation. Transformations are useful tools for meeting the assumptions of ANOVA; however, their use means investigators have to decide how to present data. Transformed data may be confusing as it is often in units that are hard to interpret. Two options to overcome this are 1) to present the raw data (and specify so in legends), or 2) to present data that is back transformed. With back transformations, we: 1) transform the data (using arcsine (sqrt(X)); 2) calculate the mean (X bar), lower, and upper confidence limits (e.g. mean + 95% CI) of the transformed data; 3) we back-transform these summaries: i.e., take the the sine of the lower CI, mean, and upper CI [the value; not the length of the CI], and then square each of them. This gives the position of the three summary statistics, but on the original scale of measurement. Note that the CI’s will be asymmetric.
A potential problem with back-transformation of arcsin-spqrt-transformed data.
The sine function has a limit of [0,1], and similarly the arcsine function is bounded from [0, p/2]. Consider transformed data with a mean of Z and a 95% CI that extends from L to U (see figure). Notice that Z-L = U-Z (i.e., the interval is symmetric around the mean). However, U can exceed p/2 (and similarly, L can be less then 0). In these cases, backtransformation is problematic because the sine function varies between 0 and 1 continuously along the x-axis. Notice that [sin(U)]2 > [sin(p/2)]2 = 1.0 (see figure). In other words, the upper confidence limit (in this example) is artificially reduced upon back-transformation: if U>Z then B(U)>B(Z), where B stands for the back-transformation function. But in this example, this is violated.
There are several possible solutions:
1) modify the backtransformation so that B(X)=1.0 when X>p/2 and B(X)=0.0 when X<0).
2) use a bootstrap approach to generate CI’s.
3) Use a different statistical model (e.g. logistic regression or beta regression) that directly deals with distributional properties of proportions without requiring transformation to achieve normality.
In Boyer et al (2009) we have used approach #1 (i.e., we retained the back-tranformation approach), but we corrected the error introduced when CI’s overlap 0 or p/2 on the transformed scale.
This document was constructed by Adrian Stier and Shane Geange with edits and discussion with Ben Bolker and Craig Osenberg.