Week 6 began statistical analysis using SPSS, specifically for non-parametric tests. Non-parametric data can be described as data that does not conform to normal distribution. A simple example is ranked data such as movie reviews (0 – 5 stars). A major limitation of non-parametric data is the increased sample size required to gain sufficient significance to reject a null hypothesis.
A good summary of the assorted types of non-parametric tests was found at http://www.graphpad.com/www/book/choose.htm:
|
Type of Data |
Goal |
Measurement (from Gaussian Population) |
Rank, Score, or Measurement (from Non- Gaussian Population) |
Binomial
(Two Possible Outcomes) |
Survival Time |
Describe one group |
Mean, SD |
Median, interquartile range |
Proportion |
Kaplan Meier survival curve |
Compare one group to a hypothetical value |
One-sample t test |
Wilcoxon test |
Chi-square
or
Binomial test ** |
|
Compare two unpaired groups |
Unpaired t test |
Mann-Whitney test |
Fisher’s test
(chi-square for large samples) |
Log-rank test or Mantel-Haenszel* |
Compare two paired groups |
Paired t test |
Wilcoxon test |
McNemar’s test |
Conditional proportional hazards regression* |
Compare three or more unmatched groups |
One-way ANOVA |
Kruskal-Wallis test |
Chi-square test |
Cox proportional hazard regression** |
Compare three or more matched groups |
Repeated-measures ANOVA |
Friedman test |
Cochrane Q** |
Conditional proportional hazards regression** |
Quantify association between two variables |
Pearson correlation |
Spearman correlation |
Contingency coefficients** |
|
Predict value from another measured variable |
Simple linear regression
or
Nonlinear regression |
Nonparametric regression** |
Simple logistic regression* |
Cox proportional hazard regression* |
Predict value from several measured or binomial variables |
Multiple linear regression*
or
Multiple nonlinear regression** |
|
Multiple logistic regression* |
Cox proportional hazard regression* |
All of the tests described in the table above can be applied via SPSS. Note that “Gaussian population” refers to normally distributed data. Not featured in the table above is the sign test, perhaps as it is described as lacking statistical power of paired t-tests or the Wilcoxon test.
One question that immediately comes to mind is how the process of normalization can be applied to force comparison of normally distributed data to non-parameter data.
The lecture went on to describe important assumptions and the rationale behind several test methods. I will await further practical testing with SPSS before going into more detail on them.