PCAtest uses random permutations to build null distributions for several statistics of a PCA analysis: Psi (Vieira 2012), Phi (Gleason and Staelin 1975), the rank-of-roots (ter Braak 1988), the index of the loadings (Vieira 2012), and the correlations of the PC with the variables (Jackson 1991). Comparing these distributions with the observed values of the statistics, the function tests: (1) the hypothesis that there is more correlational structure among the observed variables than expected by random chance, (2) the statistical significance of each PC, and (3) the contribution of each observed variable to each significant PC. The function also calculates the sampling variance around mean observed statistics based on bootstrap replicates.

PCAtest(
  x,
  nperm = 1000,
  nboot = 1000,
  alpha = 0.05,
  indload = TRUE,
  varcorr = FALSE,
  counter = TRUE,
  plot = TRUE
)

Arguments

x

A matrix or dataframe with variables in the columns and the observations in the rows.

nperm

Number of random permutations to build null distributions of the statistics.

nboot

Number of bootstrap replicates to build 95%-confidence intervals of the observed statistics.

alpha

Nominal alpha level for statistical tests.

indload

A logical indicating whether to calculate the index loadings of the variables with the significant PCs.

varcorr

A logical indicating whether to calculate the correlations of the variables with the significant PCs.

counter

A logical specifying whether to show the progress of the random sampling (bootstrap and permutations) on the screen.

plot

A logical specifying whether to plot the null distributions, observed statistics, and 95%-confidence intervals of statistics based on random permutation and bootstrap resampling.

Value

An object of class “list” with the following elements:

psiobs

The observed Psi statistic.

phiobs

The observed Phi statistic.

psi

The null distribution of Psi values.

phi

The null distribution of Phi values.

pervarobs

The percentage of variance explained by each PC based on the observed data.

pervarboot

The percentage of variance explained by each PC based on the bootstrapped data.

pervarperm

The percentage of data explained by each PC based on the permuted data.

indexloadobs

The index of the loadings of the observed data.

indexloadboot

The index of the loadings of the bootstrapped data.

indexloadperm

The index of the loadings of the permuted data.

corobs

If varcorr=TRUE, the correlations of the observed variables with each significant PC.

corboot

If varcorr=TRUE, the correlations of the observed variables with each significant PC based on the bootstrapped data.

corperm

If varcorr=TRUE, the correlations of the observed variables with each significant PC based on permuted data.

Details

PCAtest uses the function stats::prcomp to run a PCA using the arguments scale = TRUE and center = TRUE. PCAtest plots four types of graphs in a single page: (1) a histogram showing the null distribution and the observed value of the Psi statistic, (2) a histogram showing the null distribution and the observed value of the Phi statistic, (3) a bar plot of the percentage of explained variance of each PC1, PC2, ..., etc., showing the sampling variance based on bootstrap replicates and random permutations with 95%-confidence intervals, and (4) a bar plot of the index of the loadings of each observed variable for PC1, showing the sampling variance of bootstrap replicates and random permutations with 95%- confidence intervals. If more than one PC is significant, additional plots for the index of the loadings are shown in as many new pages as necessary given the number of significant PCs. If the PCA is not significant, based on the Psi and Phi testing results, only histograms (1) and (2) are shown.

References

  • Gleason, T. C. and Staelin R. (1975) A proposal for handling missing data. Psychometrika, 40, 229–252.

  • Jackson, J. E. (1991) A User’s Guide to Principal Components. John Wiley & Sons, New York, USA.

  • Ringnér, M. (2008) What is principal component analysis? Nature Biotechnology, 26, 303–304.

  • ter Braak, C. F. J. (1990) Update notes: CANOCO (version 3.1). Agricultural Mattematic Group, Report LWA-88-02, Wagningen, Netherlands.

  • Vieira, V. M. N. C. S. (2012) Permutation tests to estimate significances on Principal Components Analysis. Computational Ecology and Software, 2, 103–123.

  • Wong, M. K. L. and Carmona, C. P. (2021) Including intraspecific trait variability to avoid distortion of functional diversity and ecological inference: Lessons from natural assemblages. Methods in Ecology and Evolution. https://doi.org/10.1111/2041- 210X.13568.

Author

Arley Camargo

Examples

#PCA analysis of five uncorrelated variables data("ex0") result<-PCAtest(ex0, 100, 100, 0.05, varcorr=FALSE, counter=FALSE, plot=TRUE)
#> #> Sampling bootstrap replicates... Please wait #> #> Calculating confidence intervals of empirical statistics... Please wait #> #> Sampling random permutations... Please wait #> #> Comparing empirical statistics with their null distributions... Please wait #> #> ======================================================== #> Test of PCA significance: 5 variables, 100 observations #> 100 bootstrap replicates, 100 random permutations #> ======================================================== #> #> Empirical Psi = 0.1691, Max null Psi = 0.4754, Min null Psi = 0.0522, p-value = 0.62 #> Empirical Phi = 0.0920, Max null Phi = 0.1542, Min null Phi = 0.0511, p-value = 0.62 #> #> PCA is not significant!
#PCA analysis of five correlated (r=0.25) variables data("ex025") result<-PCAtest(ex025, 100, 100, 0.05, varcorr=FALSE, counter=FALSE, plot=TRUE)
#> #> Sampling bootstrap replicates... Please wait #> #> Calculating confidence intervals of empirical statistics... Please wait #> #> Sampling random permutations... Please wait #> #> Comparing empirical statistics with their null distributions... Please wait #> #> ======================================================== #> Test of PCA significance: 5 variables, 100 observations #> 100 bootstrap replicates, 100 random permutations #> ======================================================== #> #> Empirical Psi = 1.1327, Max null Psi = 0.3715, Min null Psi = 0.0408, p-value = 0 #> Empirical Phi = 0.2380, Max null Phi = 0.1363, Min null Phi = 0.0452, p-value = 0 #> #> Empirical eigenvalue #1 = 1.88693, Max null eigenvalue = 1.43511, p-value = 0 #> Empirical eigenvalue #2 = 1.01454, Max null eigenvalue = 1.22877, p-value = 0.98 #> Empirical eigenvalue #3 = 0.8287, Max null eigenvalue = 1.08534, p-value = 1 #> Empirical eigenvalue #4 = 0.79299, Max null eigenvalue = 0.95169, p-value = 0.97 #> Empirical eigenvalue #5 = 0.47684, Max null eigenvalue = 0.90942, p-value = 1 #> #> PC 1 is significant and accounts for 37.7% (95%-CI:30.4-45.7) of the total variation #> #> Variables 2, and 3 have significant loadings on PC 1 #>
#PCA analysis of five correlated (r=0.5) variables data("ex05") result<-PCAtest(ex05, 100, 100, 0.05, varcorr=FALSE, counter=FALSE, plot=TRUE)
#> #> Sampling bootstrap replicates... Please wait #> #> Calculating confidence intervals of empirical statistics... Please wait #> #> Sampling random permutations... Please wait #> #> Comparing empirical statistics with their null distributions... Please wait #> #> ======================================================== #> Test of PCA significance: 5 variables, 100 observations #> 100 bootstrap replicates, 100 random permutations #> ======================================================== #> #> Empirical Psi = 5.4655, Max null Psi = 0.4744, Min null Psi = 0.0527, p-value = 0 #> Empirical Phi = 0.5228, Max null Phi = 0.1540, Min null Phi = 0.0513, p-value = 0 #> #> Empirical eigenvalue #1 = 3.08098, Max null eigenvalue = 1.51758, p-value = 0 #> Empirical eigenvalue #2 = 0.63526, Max null eigenvalue = 1.25918, p-value = 1 #> Empirical eigenvalue #3 = 0.52603, Max null eigenvalue = 1.06865, p-value = 1 #> Empirical eigenvalue #4 = 0.43234, Max null eigenvalue = 0.97229, p-value = 1 #> Empirical eigenvalue #5 = 0.3254, Max null eigenvalue = 0.84649, p-value = 1 #> #> PC 1 is significant and accounts for 61.6% (95%-CI:54.6-68.6) of the total variation #> #> Variables 1, 2, 3, 4, and 5 have significant loadings on PC 1 #>
#PCA analysis of seven morphological variables from 29 ant species (from #Wong and Carmona 2021, https://doi.org/10.1111/2041-210X.13568) data("ants") result<-PCAtest(ants, 100, 100, 0.05, varcorr=FALSE, counter=FALSE, plot=TRUE)
#> #> Sampling bootstrap replicates... Please wait #> #> Calculating confidence intervals of empirical statistics... Please wait #> #> Sampling random permutations... Please wait #> #> Comparing empirical statistics with their null distributions... Please wait #> #> ======================================================== #> Test of PCA significance: 7 variables, 29 observations #> 100 bootstrap replicates, 100 random permutations #> ======================================================== #> #> Empirical Psi = 10.9186, Max null Psi = 3.0303, Min null Psi = 0.7742, p-value = 0 #> Empirical Phi = 0.5099, Max null Phi = 0.2686, Min null Phi = 0.1358, p-value = 0 #> #> Empirical eigenvalue #1 = 3.84712, Max null eigenvalue = 2.33045, p-value = 0 #> Empirical eigenvalue #2 = 1.52017, Max null eigenvalue = 1.65875, p-value = 0.25 #> Empirical eigenvalue #3 = 0.70634, Max null eigenvalue = 1.4709, p-value = 1 #> Empirical eigenvalue #4 = 0.41356, Max null eigenvalue = 1.10378, p-value = 1 #> Empirical eigenvalue #5 = 0.34001, Max null eigenvalue = 0.91453, p-value = 1 #> Empirical eigenvalue #6 = 0.14515, Max null eigenvalue = 0.77857, p-value = 1 #> Empirical eigenvalue #7 = 0.02765, Max null eigenvalue = 0.60171, p-value = 1 #> #> PC 1 is significant and accounts for 55% (95%-CI:43.3-64.6) of the total variation #> #> Variables 1, 2, 3, 4, 5, and 7 have significant loadings on PC 1 #>