Dimensionality reduction and ordinations in ecology 1: introduction

Ecological research usually deals with large numbers of variables such as species compositions with many species or biological responses with many environmental factors.  To recognize ecological patterns, we first look correlations and regressions on 2-d or 3-d space. What if your variables are so many, say more than a hundreds? There is no way to plot 100 d space for pattern recognition.  Biotic community can easily have more than hundred species and when you want to compare species compositions of 30 plots in your study, you need something to summarize species composition to compare your plots.  In those cases, you can use dimensionality reduction such as principal component analysis (PCA), non-metric multidimensional scaling (NMS), corresponding analysis (CA), detrended corresponding analysis (DCA), and canonical corresponding analysis (CCA), etc.


Fig. 1  An example showing the concept of dimensionality reduction. Species composition data (left) of seed banks at three different successional stages (Wet Medow, Young forest, Mature forest) were reduced into dots in a Non-metric multidimensional scaling (NMS) plot.  Source: Cho et al. Journal of Ecology and Environment (2018) 42:12

Figure 1 shows an example of dimensionality reduction using non-metric multidimensional scaling (NMS).  Cho et al. (2018) studied species compositions and seed banks of 36 plots at three different successional stages.  Their survey includes 59 species, making comparison by eye impossible.  However, after dimensionality reduction using NMS, we can easily find ecological patterns among 36 plots from the three successional stages.  On the right panel in Figure 1, one dot indicates species composition of one plot.  It should have 36 points since they studied species compositions of 36 plots.

Fig. 2 Process to make data matrix from relative abundance data of four different plots. ©S Park

Figure 2 explains the process to make data matrix from species relative abundance data. The species composition data (relative abundance data) collected from one plot produce a row of values.  Four species composition data from 4 plots can produce 4 rows of species composition, which is a matrix data (number of plots x number of species). Data matrix should be build in a form of observations (plots) as rows and variables (species) as columns.  After a dimensionality reduction process, relative abundance data from a plot can be expressed as a point in an ordination space such as PCA or NMS ordination (score plot).  Now you can see similarity of 4 different plots.  Plot 1 and 2 appear to be close as well as plot 3 and 4.  Therefore, we can interpret these results and conclude that species compositions of plot 1 and 2 are clustered, and species compositions of plot 3 and 4 are clustered, too.

Let's learn more about ordination results using principal component analysis (PCA).  First, I will show a most typical presentation of PCA score plot.

Fig 3. A score plot of PCA on fatty acid composition of sesarmid crab, Sesarma dehaani in three different wetlands along salinity gradient in Han River estuary, Korea. Source: Yang DW et al. Journal of Ecology and Environment, 2019 43:6 

Yang et al. (2019) analyzed fatty acid compositions of Sesarmid crab, Sesarma dehaani from three different wetlands along salinity gradient.  PCA score plot shows that fatty acid compositions are different between Site 1/2 and Site 3 and are different between size group I/II (young) and size group III/IV (older) suggesting those crabs feed different diets depending on their habitats and their sizes.

From the 2 examples we covered above, we can learn that it is possible to reduce dimensionality any multivariate targets such as species compositions with relative abundances of many species or fatty acid compositions with peak areas of many individual fatty acids.  Other examples of multivariate data would be T-RFLP (terminal restriction fragment length polymorphism) profiles, microarray gene expression data and metabolic profiles using GC-MS or LC-MS.

Sometimes, we have environmental factor matrix in addition to species composition matrix, which give us chances to interpret community compositional differences explained by environmental factors.



Fig. 4. Data matrix for simple ordinations such as PCA or CA and canonical ordinations such s RDA or CCA Source: Legendre P and Legendre L. 1998. Canonical Analysis in Numerical Ecology. Developments in Environmental Modelling 20: 575-635


Ordination such as redundancy analysis (RDA) and canonical correspondence analysis combine dependent matrix with independent matrix of environmental variables.

Fig. 5. (a) Ordination result based on the community structure and environmental characteristics using redundancy analysis (RDA). (b) Schematic diagram of the distribution between Phacelurus latifolius (PL) and Phragmites australis (PA). Source: Nam et al. Journal of Ecology and Environment (2018) 42:25


Above RDA results show tidal marsh vegetation compositions of Phacelurus dominated communities and Phragmites dominated communities along the salinity gradient.  The salinity gradient are explained by Na+ and EC axis.



In conclusion, dimensionality reduction and ordination approach such as PCA, NMS, and RDA can reduce high dimensional data into points in 2-d or 3-d space, sometimes with environmental factor axis (RDA and CCA).  Therefore, these ordination tools are very efficient way to find ecological patterns.  However, it is always good to remember that recognizing ecological patterns is only the first step to do ecological research.  As regression patterns only suggest causal relationships without proving, ordination approaches are the first step for further mechanistic research in ecology.


by Sangkyu Park (Ajou University)





References

Cho Y-C et al.  Floristic composition and species richness of soil seed bank in three abandoned rice paddies along a seral gradient in Gwangneung Forest Biosphere Reserve, South Korea. Journal of Ecology and Environment (2018) 42:12

Nam BE et al. Soil factors determining the distribution of Phragmites australis and Phacelurus latifolius in upper tidal zone. Journal of Ecology and Environment (2018) 42:25


Yang DW et al. Intraspecific diet shifts of the sesarmid crab, Sesarma dehaani, in three wetlands in the Han River estuary, South Korea. Journal of Ecology and Environment (2019) in press. 

Comments

Post a Comment