Commit 992f6ff2 authored by GAREL Marc's avatar GAREL Marc
Browse files

add envfit

parent b9bb97c7
---
title: 'PCA using envfit...triplot!'
author: "Armougom"
date: "9 Fev 2022"
output:
html_notebook:
theme: united
toc: yes
toc_depth: 3
toc_float: yes
html_document:
df_print: paged
toc: yes
toc_depth: '3'
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
- ### **<span style="color: steelblue;"> I-load Packages</span>**
```{r}
pacman::p_load("factoextra","stats","graphics","shape","adespatial")
```
- ### **<span style="color: steelblue;">II- Select and transform data</span>**
*Here We use the top 50 ASVs, but you can use others taxa ranks (i.e. Family, genus..)*
```{r}
#Selection 50 top ASVs (remove rare ASVs with many zeros)
top50ASV = names(sort(taxa_sums(Final2_rar), TRUE)[1:50])
selectASV50 = prune_taxa(top50ASV,Final2_rar)
```
```{r}
#Hellinger transformation for PCA (double zeros pb)
Hellinger <- transform_sample_counts(selectASV50, function(x) sqrt(x/sum(x)))
```
- ### **<span style="color: steelblue;">III- PCA & result manipulation</span>**
- #### **Run PCA method**
```{r}
PCA<- prcomp((otu_table(Hellinger)), scale = TRUE)
```
- #### **Manipulation of the PCA object**
*Going inside the PCA object to collect information to build, in a customized way,*
*our triplot PCA*.*We Need to collect variable (ASV) coordinates, site coordinates and environmental factors coordinates.*
- #### **Variables (ASV) selection which have the best contribution to PC1 and PC2 axes**
```{r}
#See all variables
var <- get_pca_var(PCA)
var
```
```{r}
#selection of ASV variables with their contribution to PC1 and PC2
contrib=data.frame(var$contrib[,1:2])
contrib
```
```{r}
# Addition of contribution from PC1 and PC2 for each ASV variable
scoreC=contrib$Dim.1+contrib$Dim.2
scoreC
```
```{r}
#Add name of ASV variable to scoreC
names(scoreC)<-rownames(contrib)
```
```{r}
#Using Contrib score, How many to select?
#all above the threshold (red line)
fviz_contrib(PCA, choice = "var", axes = 1:2, top = 25)
```
```{r}
#In fact 23 have to be selected...but for this example we keep 14.
#Create a list with the names of selected ASV variables (here 14) and decreasing sort
list=names(sort(scoreC,decreasing = TRUE)[1:14])
list
```
- #### **Coordinates extraction for the selected ASV variables (14)**
```{r}
#Coordinates of variables are located in PCA$rotation
#See
PCA$rotation
```
```{r}
#dataframe Transformation keeping only coordinates of PC1 and PC2
rotationnew=data.frame(PCA$rotation[,1:2])
rotationnew
```
```{r}
#Extract the coordinates only for our selected ASVs (remember list)
myselection=rotationnew[list,]
myselection
```
```{r}
#Separate coordinates x,y of ASV variables, add a multiplicator (10) and stored for the end
l.x <- myselection$PC1*10
l.y <- myselection$PC2*10
```
- ### **<span style="color: steelblue;">IV- Significant environmental factors and their coordinates</span>**
- #### **Selection and transformation of env factors**
```{r}
#Select the environmental factors to test
M<-select(meta,SiOH4:PO4,T)
```
```{r}
#Standardization (i.e. put data in "same unit")
env.z <- decostand(M, method="standardize")
```
- #### **Select forward process : for using only the best env factors**
```{r}
#Need to select the variables to test before using too much envrionmental variables?
#select forward #remember that this process is criticized.
sel.fs <- forward.sel(Y = otu_table(Hellinger), X = env.z)
#see
sel.fs
```
*We will compare this proposal with the result of multiple regression of*
*envfit using all env factors.*
- #### **Run envfit function**
*function fits environmental vectors or factors onto an ordination*
```{r}
ef <- envfit (PCA ~ NH4 +SiOH4 + T+ NO2 + NO3 + PO4, data = env.z, perm = 999)
ef
```
- #### **Extraction of coordinates of significant env factors**
```{r}
#ef$vectors$arrows stored the coordinates
#ef$vectors$pvals stored the pvalues
efcp=cbind(ef$vectors$arrows,ef$vectors$pvals)
efcp
```
```{r}
#Keep only significant env factors with pvalues < 0.05
efcp=data.frame(efcp)
res_final=efcp[(efcp$V3 <=0.05),]
```
```{r}
#Extract x (Vx) and y (Vx) coordinates of significant env factors and add a multiplicator
#For x axis
Vx=res_final$PC1
#See
Vx
names(Vx) = rownames(res_final)
#See
Vx
Vx=Vx*5
#See
Vx
#For y axis
Vy=res_final$PC2
names(Vy)= rownames(res_final)
Vy=Vy*5
```
- ### **<span style="color: steelblue;">V- TriPlot PCA: Sites, ASV variables and env factors</span>**
- #### **Shape and color for the sites**
*Should be modified according your data*
```{r}
pch.group1 <- c(rep(15, times=1), rep(16, times=9), rep(15, times=8))
col.group1 <- c(rep("skyblue2", times=1), rep("gold", times=9), rep("skyblue2", times=8))
```
- #### **Extract variance explained for the 2 axes**
```{r}
#variance is stored here
variance=summary(PCA)[6]
variance
#Dataframe and retained variance for 2 PC
variance=data.frame(variance)[,1:2]
variance
#Variance PC1
variance$importance.PC1[2]
#Variance PC2
variance$importance.PC2[2]
```
- #### **Plot**
```{r}
#coordinates of sites are stored in PCA$x
plot(PCA$x[,1], PCA$x[,2],
xlim=c(-7,10),ylim=c(-5,7.5),
xlab=paste("PCA1 (", round(x$importance.PC1[2]*100, 1), "%)", sep = ""),ylab=paste("PCA2 (", round(x$importance.PC2[2]*100, 1), "%)", sep = ""),cex=1, las=1, asp=1, pch=pch.group1, col=col.group1)
# Add grid lines
abline(v=0, lty=2, col="grey50")
abline(h=0, lty=2, col="grey50")
# Add legend
legend("topleft", legend=c("South", "North"),
col=c("skyblue2", "gold"),
pch=c(15, 17), pt.cex=1,cex=0.5)
#SITES
# Add labels on Site coordinates
text(PCA$x[,1], PCA$x[,2], labels=row.names(PCA$x), pos=c(1,3,4,2), font=2, cex=0.5, col="grey50")
#ASV VARIABLES
# Draw arrows for ASV Variables using coordinates
Arrows(x0=0, x1=l.x, y0=0, y1=l.y, col="lightcoral", lwd=1, arr.length=0.2, arr.type = "triangle")
#Display ASV Variables (labels)
text(l.x, l.y, labels=row.names(myselection), col="lightcoral", cex=0.5,pos=2)
#ENV FACTORS
# Draw arrows for env factors using coordinates
Arrows(x0=0, x1=Vx, y0=0, y1=Vy, col="blue", arr.length=0.2, lwd=1, arr.type = "triangle",lty = 3)
#Display env factors (labels)
text(Vx, Vy, labels=names(Vx), col="blue",cex=0.7,pos=c(1,2))
####
```
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment