Commit b9bb97c7 authored by GAREL Marc's avatar GAREL Marc
Browse files

add betadiv

parent 1ccbc14d
---
title: 'Part Phyloseq-betadiv'
date: "12-16 Mars 2021"
output:
html_notebook:
theme: united
toc: yes
toc_depth: 3
toc_float: yes
html_document:
df_print: paged
toc: yes
toc_depth: '3'
authors: Armougom/Garel
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
- ## **<span style="color: steelblue;">I- Different Transformations</span>**
*Be careful when using transformation as it can distort the data. Use transformation*
*especially for performing PCA to avoid sensitivity of double zeros (e.g Hellinger or chord transformation).*
- #### **TSS (Total Sum Scaling) Transformation**
```{r}
TSS <- transform_sample_counts(Final2_rar, function(x){x / sum(x)})
```
- #### **Log Transformation**
```{r}
Tlog <- transform_sample_counts(Final2_rar, function(x) log(x+1))
```
- #### **Square root Transformation**
```{r}
Tsqr <- transform_sample_counts(Final2_rar, function(x) sqrt(x))
```
- #### **Hellinger transformation**
*convert the data to proportion and then take the square root*
*double zero issue, PCA*
```{r}
Hellinger <- transform_sample_counts(Final2_rar, function(x) sqrt(x/sum(x)))
```
- ## **<span style="color: steelblue;">II- Distances & Ordination </span>**
- #### **Unifrac Distance AND PCoA**
```{r}
gp <-ordinate(Final2_rar,"PCoA","unifrac")
```
- #### **Ordination plot: plot ONLY samples AND color using Treatment column**
```{r}
plot_ordination(Final2_rar,gp, type="samples", color="Treatment")+ geom_text(mapping = aes(label = Treatment1), size = 2,vjust = 2.1)
```
- #### **Ordination biplot: plot samples AND ASV**
```{r}
Unif= plot_ordination(Final2_rar,gp, type="biplot", color="Phylum")+ geom_text(mapping = aes(label = Treatment1), size = 2,vjust = 2.1)
Unif
```
*Why if I'm using Final2_rar (so no transformation) rather than Hellinger*
*I will obtain the exactly same ordination plot using Unifrac distance?*
- #### **Bray Curtis Distance AND PCoA**
```{r}
gp2 <-ordinate(Final2_rar,"PCoA","bray")
```
- #### **Ordination biplot: plot samples AND ASV**
```{r}
bra= plot_ordination(Final2_rar,gp2, type="biplot", color="Phylum")+ geom_text(mapping = aes(label = Treatment1), size = 2,vjust = 2.1)
bra
```
*Can you conclude about the results obtained using Bray curtis or Unifrac distance for your ordination?*
- #### **Bray curtis distance AND NMDS ordination**
```{r}
gp3<-ordinate(Final2_rar,"NMDS","bray")
```
- #### See Stress
```{r}
gp3
```
*Interpret Stress??*
- #### **Ordination Plot : Biplot**
```{r}
pbray=plot_ordination(Final2_rar,gp3, type="biplot", color="Phylum")+ geom_text(mapping = aes(label = Treatment1), size = 2,vjust = 2.1)
pbray
```
*Why the plot does not show any % score on the axes?*
- #### **Split Biplot in two windows**
```{r}
plot_ordination(Final2_rar,gp3, type="split", color="Phylum")+ geom_text(mapping = aes(label = Treatment1), size = 2, vjust = 2.1)
```
- #### **Color change**
- #### **Manual**
```{r}
cols=c("black","yellow","blue","chartreuse3","orange","green","orchid1","cyan1","cornsilk1","sienna2","hotpink")
```
- #### **Automatic**
```{r}
mycolors1 = c(brewer.pal(name="Set2", n = 8), brewer.pal(name="Paired", n = 3))
```
- #### **Apply**
```{r}
pbray + scale_color_manual(values = cols)
```
```{r}
pbray + scale_color_manual(values = mycolors1)
```
- #### **Filter before Ordination**
*Trimming low abundance ASVs to remove noise & to improve sample separation.*
*Do this with caution check that you do not loose too much data!!!*
- #### **Selection of the top rank 50 (ASV)**
*NB: From inside to outside: Function4(Function3(Function2(Function1())))*
```{r}
filtre2 <- prune_taxa(names(sort(taxa_sums(TSS),TRUE)[1:50]), TSS)
```
*What is the "Function1"??*
- #### **Check for the loss of data**
```{r}
sample_sums(filtre2)
```
*What do you think?*
- ##### **Distance matrix with bray curtis & PCoA Ordination**
```{r}
gp50=ordinate(filtre2,"PCoA","bray")
```
- ##### **Ordination Plots, Samples only**
```{r}
plot_ordination(filtre2,gp50, type="samples", color="Treatment",shape="Treatment") + geom_text(mapping = aes(label = Treatment1), size = 3,vjust = 2.1)
```
- #### **Ordination Plot with Ellipse**
```{r}
plot_ordination(filtre2,gp50, type="samples", color="Treatment",shape="Treatment") + geom_text(mapping = aes(label = Treatment1), size = 3,vjust = 2.1)+ stat_ellipse()
```
- ## **<span style="color: steelblue;">III- Differential Abundance</span>**
- ### **<span style="color: steelblue;">A- Permanova/Adonis : North-South Group comparison</span>**
- #### **Using Genus abundance**
*You can do it at any taxonomic level*
```{r}
genus_object <- aggregate_taxa(Final2_rar,'Genus')
```
```{r}
#Extract the genus abundance table
ab_genus <- abundances(genus_object)
```
- #### **Run Permanova**
*Non parametric multivariate statistical test. Use to compare*
*groups... test the H0 hypothesis that the groups have similar centroid and/or dispersion.*
*The similarity is based on a chosen distance measure!!PERMANOVA tests whether distance differ between groups. For your information: ANOSIM tests whether distances between groups are greater than within groups. Regarding MANOVA, its correct application needs normal and homocedastic data, number of variables be much smaller than the number of individuals/sites, rarely the case for ecological data. To extend the application to this data PERMANOVA was developped, is insensitive to many zero, you need balanced groups*. *Other good test : Mantel test.*
```{r}
permanova <- adonis(t(ab_genus) ~ Treatment,data = meta, permutations=999, method = "bray")
permanova
```
*What do you conclude between North and South?*
*if you have more than 2 groups, you need a post-hoc test :* *https://github.com/pmartinezarbizu/pairwiseAdonis*
- #### **Does the significance is due to group Dispersion rather than centroid?**
*You want to show that your groups have different centroid or not... but dispersion can biais*
*your conclusion/result*
```{r}
#Distance matrix with bray curtis
dist_bray <- vegdist(t(ab_genus),method="bray")
```
```{r}
#Dispersion calcul
homogeneity <-betadisper(dist_bray, meta$Treatment)
```
```{r}
#Statistical test permutest
#H0 is no difference in dispersion between groups
perm_res <-permutest(homogeneity, pairwise = TRUE, permutations = 999)
perm_res
```
*Conclusion?*
```{r}
#a little visualization
plot(homogeneity, ellipse = TRUE, hull = FALSE)
```
- #### **Show coefficients for the top taxa separating the two groups**
```{r}
coef <- coefficients(permanova)["Treatment1",]
top.coef <- coef[rev(order(abs(coef)))[1:20]]
par(mar = c(3, 14, 2, 1))
barplot(sort(top.coef), horiz = T, las = 1, main = "Top taxa")
```
- ### **<span style="color: steelblue;">B- Differential Abundance with Deseq2</span>**
- #### **Transformation into deseq object**
*Transformation from phyloseq to deseq object with the column to compare: here, column "Treatment" has two values North and South: you compare South and North groups for ASV abundance*
*Keep in mind that you give non normalized data, as deseq2 has it own normalization process!!!*
```{r}
#Do it using genus abundance to compare to PERMANOVA results
G1=aggregate_taxa(Final1,'Genus')
```
```{r}
#Tranformation in DeseqDataSet
dds = phyloseq_to_deseq2(G1, ~Treatment)
```
```{r}
#Data transformation, correction of library size (size factor)
dsf = DESeq(dds, fitType="local")
```
```{r}
#Result
res <- results(dsf)
res
```
- #### **Order by p-values adjusted (Benjamini-hochberg)**
*#na.last = NA means remove NA from file*
```{r}
res = res[order(res$padj, na.last=NA), ]
```
```{r}
#Keep p-values < 0.05
sig_tab = res[(res$padj <= 0.05), ]
```
```{r}
#Combine statistic results to the full taxonomy
sig_tab = cbind(as(sig_tab, "data.frame"), as(tax_table(G1)[rownames(sig_tab), ], "matrix"))
```
```{r}
#Keep columns that you want
posigtab = sig_tab[, c("baseMean", "log2FoldChange","padj", "Phylum","Class", "Family")]
posigtab
```
```{r}
#Save
write.table(posigtab,"./deseq2_otudifferential_ASV_NorthvsSouth.txt")
```
- ### **<span style="color: steelblue;">C-Differential abundance with ANCOM</span>**
*Probably the (one of) best method at this time. No normalization before use it, give the raw abundance.*
- #### **Raw abundance**
```{r}
genus_object <- aggregate_taxa(Final1,'Genus')
```
```{r}
#Run ANCOM BC
out = ancombc(
phyloseq = genus_object,
formula = "Treatment",
p_adj_method = "BH",
zero_cut = 0.9,
lib_cut = 0,
group = "Treatment",
struc_zero = TRUE,
neg_lb = TRUE,
tol = 1e-5,
max_iter = 100,
conserve = TRUE,
alpha = 0.05,
global = TRUE
)
```
```{r}
#Show result with p values adjusted
cbind(out$res$diff_abn,out$res$q_val)
```
- ## **<span style="color: steelblue;"> IV-Linear (PCA/RDA) vs. Unimodal (CA/CCA)</span>**
*DCA >4 (unimodal), <3 (Linear) . between 3 and 4 : anyone (Linear or unimodal)*
*Look the axis length value for DCA1*
```{r}
#Run DCA
DCA <- decorana (log1p (otu_table(Final2_rar)))
DCA
```
*Conclusion?*
- ### **<span style="color: steelblue;">A- Canonical correspondence Analysis </span>**
- #### **Selected abundance data : example with the top50 ASV**
```{r}
top50ASV = names(sort(taxa_sums(Final2_rar), TRUE)[1:50])
```
- #### **Extract the top50from phyloseq object**
```{r}
selectASV50 = prune_taxa(top50ASV,Final2_rar)
```
```{r}
#Get otu_table
abunds1 =otu_table(selectASV50)
```
```{r}
#Transform abunds1 in a dataframe format
abunds1 <-as.data.frame(abunds1)
```
```{r}
#Transform data log(x+1)
abundantlog1 = log1p(abunds1)
```
```{r}
#Select what columns -env variables- you want to integrate in the CCA**
#See
meta
```
```{r}
#Apply your choice with select
se<-select(meta,SiOH4:PO4,T)
```
- #### **Transform environmental data (Z-score)**
```{r}
env.z <- decostand(se, method="standardize")
```
- #### **Make CCA**
```{r}
cca_model<-cca( abundantlog1~ .,data=env.z)
cca_model<-cca( abundantlog1~ NH4 +SiOH4 + T+ NO2 + NO3 + PO4,data=env.z)
```
```{r}
#See
cca_model
```
- #### **Plot CCA eigenvalue (inertia explained)**
```{r}
plot_scree(cca_model)
```
- #### **Plot CCA**
```{r}
plot(cca_model, display=c('cn','wa'), scaling=2)
```
- #### **Test the significance of environmental variables**
*anova.cca= anova like, Permutation Test for CCA, Redundancy Analysis
*separate significance test for each term (constraining variable).*
```{r}
#Global
anova.cca(cca_model)
```
```{r}
#Significant Env variable
anova.cca(cca_model, by="term")
```
- ### **<span style="color: steelblue;"> B- Redundancy Analysis: db-RDA</span>**
- #### **Which measure?**
```{r}
rankindex(env.z, otu_table(Final2_rar), indices = c("man", "gow", "bra", "kul"),stepacross= FALSE, method = "spearman")
```
- #### **dbRDA**
```{r}
dbRDA <- capscale(otu_table(Final2_rar)~ NH4 +SiOH4 + T+ NO2 + NO3 + PO4, data=env.z, distance = "bray")
```
- #### **See**
```{r}
dbRDA
```
- #### **Plot**
```{r}
plot(dbRDA,display=c('cn','wa'), scaling=2)
```
- #### **Statistics**
```{r}
anova.cca(dbRDA)
```
- #### **Env variables**
```{r}
anova.cca(dbRDA, by="term")
```
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment