Commit 962eb728 authored by GAREL Marc's avatar GAREL Marc
Browse files

Upload New File

parent a7dbca23
---
title: 'Part I: DADA2'
date: "12-16 Mars 2021"
output:
html_notebook:
theme: united
toc: yes
toc_depth: 3
toc_float: yes
html_document:
df_print: paged
toc: yes
toc_depth: '3'
authors: Armougom/Garel
---
## <span style="color: steelblue;">I-Packages/Libraries</span>
```{r}
################### Bioconductor Install & bioconductor packages
#Install Bioconductor packages
#if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
#BiocManager::install()
if (!requireNamespace("dada2", quietly = TRUE)) BiocManager::install("dada2")
if (!requireNamespace("phangorn", quietly = TRUE)) BiocManager::install("phangorn")
if (!requireNamespace("DECIPHER", quietly = TRUE)) BiocManager::install("DECIPHER")
if (!requireNamespace("phyloseq", quietly = TRUE)) BiocManager::install("phyloseq")
if (!requireNamespace("Biobase", quietly = TRUE)) BiocManager::install("Biobase")
if (!requireNamespace("DESeq2", quietly = TRUE)) BiocManager::install("DESeq2")
if (!requireNamespace("microbiome", quietly = TRUE)) BiocManager::install("microbiome")
if (!requireNamespace("ComplexHeatmap", quietly = TRUE)) BiocManager::install("ComplexHeatmap")
if (!requireNamespace("ANCOMBC", quietly = TRUE)) BiocManager::install("ANCOMBC")
# Install CRAN packages
#install.packages("pacman")
###### Load libraries from Bioconductor packages
library(phyloseq)
library(dada2)
library(DECIPHER)
library(phangorn)
library(Biobase)
library(microbiome)
library(DESeq2)
library(ComplexHeatmap)
library(ANCOMBC)
#######
# Install & load libraries from CRAN package with pacman
pacman::p_load("plotly","corrplot","picante","vegan","scales","gplots","ggplot2","permute","dplyr","tibble","ape","RColorBrewer","reshape2","FSA","gridExtra","devtools","factoextra","stats","graphics","shape","adespatial")
##Attached script to my session
source("./Fonctions_dada3.R")
#
```
## **<span style="color: steelblue;">II-Path- set working directory</span>**
<span style="color: steelblue;">__Rstudio Working Directory: Location of where you are going to work.__</span>
__Move to the directory which contains your data files, follow:__
__Menu ->Session -> Set working directory -> Choose Directory <span style="color: red;">(which is TP_METABARCODING)</span>__
__Your sequencing data is in the directory data so:__
```{r}
#Put the path directory of your data files in a « variable » named path
path="./data"
```
## <span style="color: steelblue;">III- Read the sequencing files</span>
###<span style="color: steelblue;">__a/list of sequencing files__</span>
The « forward » & « reverse » files are in fastq format with the label:
<span style="color: red;">SAMPLENAME_R1.fastq</span> for the Forward files and <span style="color: red;">SAMPLENAME_R2.fastq</span> for the reverses files.
R1 is Forward, R2 is Reverse
Function : list.files
https://stat.ethz.ch/R-manual/R-devel/library/base/html/list.files.html
```{r}
fnFs <- sort(list.files(path, pattern="_R1.fastq", full.names=TRUE))
fnRs <- sort(list.files(path, pattern="_R2.fastq", full.names=TRUE))
```
__To understand: What is the contain of FnFs & FnRs files??__
```{r}
fnFs
```
[1] "./data/S11B_R1.fastq" "./data/S1B_R1.fastq"
"./data/S2B_R1.fastq"
"./data/S2S_R1.fastq"
"./data/S3B_R1.fastq"
"./data/S3S_R1.fastq"
"./data/S4B_R1.fastq"
etc
###<span style="color: steelblue;">__b/ Extract the names of the samples__</span>
```{r}
sample.names <- sapply(strsplit(basename(fnFs), "_"), `[`, 1)
```
__To understand:__
__strsplit__ : split character chain according a defined pattern
https://www.rdocumentation.org/packages/base/versions/3.6.0/topics/strsplit
__basename__: simplify the « PATH »
https://www.rdocumentation.org/packages/base/versions/3.6.0/topics/basename
```{r}
strsplit(basename(fnFs), "_")
```
“strsplit” function?
```{r}
sapply(strsplit(basename(fnFs), "_"), `[`, 2)
sapply(strsplit(basename(fnFs), "_"), `[`, 3)
```
« Sapply » function?
## <span style="color: steelblue;">IV-Sequence Quality Check</span>
Function: plotQualityProfile :
https://www.rdocumentation.org/packages/dada2/versions/1.0.3/topics/plotQualityProfile
We use a function implemented in the Fonctions_dada3.R script, which takes the list of R1 files and R2 files and put all quality plot result in one pdf file and save in your current directory TP_METABARCODING.
```{r}
qualityprofile(fnFs,fnRs,'qualityplot.pdf')
```
<span style="color: steelblue;">__open the qualityplot.pdf__</span>
## <span style="color: steelblue;">V-Sequence Trimming</span>
###<span style="color: steelblue;">__a/ Prepare the directory for the Trimming process__</span>
__Function : file.path :__
https://www.rdocumentation.org/packages/R.utils/versions/2.8.0/topics/filePath
On crée un dossier nommé « Filtered », et à l’intérieur on y met les fichiers nommés « nom du sample” pour R1 et R2. Ces fichiers sont vides pour le moment… c’est une préparation pour l’étape de filtrage.
```{r}
filtFs <- file.path(path, "Filtered", basename(fnFs))
filtRs <- file.path(path, "Filtered", basename(fnRs))
names(filtFs) <- sample.names
names(filtRs) <- sample.names
```
###<span style="color: steelblue;">__b/ Trimming process (important !!!!)__</span>
__Function filterAndTrim__: https://rdrr.io/bioc/dada2/man/filterAndTrim.html
```{r}
out <- filterAndTrim(fnFs, filtFs, fnRs, filtRs, trimLeft=20, trimRight=20, minLen=150, maxN=0, maxEE=c(2,2), truncQ=2, compress=FALSE, multithread=TRUE)
```
__To see your results__
```{r}
out
```
__Details:__
__fnFs__ : Input, your row Forward list (path) of sequencing files.
__filtFs__ : Output, the empty files created at the previous stage that you are now going to fill with this command.
__fnRs, filRs__ : Same as above, but for the Reverse data
__TruncLen__ : Truncate reads after truncLen bases. Reads shorter than this are discarded.
Exple : TruncLen=c(200,150), means R1 forward are cut at 200 pb & the R2 reverse at 150 pb.
__TrimLeft__ : The number of nucleotides to remove from the start of each read, from the left
__Trimright__ : The number of nucleotides to remove from the start of each read, from the right
__maxN__ : Max number of ambiguous bases accepted
__maxEE__ : Quality system to remove low quality read. Standard maxEE=c(2,2).
If you want to be more flexible (accept more low quality), increase the value, maxEE=c(2,5).
2 is for the forward, 5 for le reverse.
__TruncQ=2__. Truncate reads at the first instance of a quality score less than or equal to truncQ.
## <span style="color: steelblue;">VI-Learn « Errors »</span>
__Function: learnErrors__ https://rdrr.io/bioc/dada2/man/learnErrors.html
Parametric model to distinguish sequencing artefacts from true biological variations
```{r}
errF <- learnErrors(filtFs, randomize =TRUE, multithread=TRUE)
errR <- learnErrors(filtRs, randomize =TRUE, multithread=TRUE)
plotErrors(errF, nominalQ=TRUE)
```
## <span style="color: steelblue;">VII- Dereplication (identical sequences)</span>
->“remove” the redundancy …
__Function: derepFastq__
https://www.rdocumentation.org/packages/dada2/versions/1.0.3/topics/derepFastq
```{r}
derepFs <- derepFastq(filtFs, verbose=TRUE)
derepRs <- derepFastq(filtRs, verbose=TRUE)
names(derepFs) <- sample.names
names(derepRs) <- sample.names
```
## <span style="color: steelblue;">VIII-Amplicon Sequence Variant =ASV</span>
<span style="color: steelblue;">__Error Corrections.__</span>
```{r}
dadaFs <- dada(derepFs, err=errF, multithread=TRUE)
dadaRs <- dada(derepRs, err=errR, multithread=TRUE)
```
## <span style="color: steelblue;">IX- Assembly</span>
__Function: mergePairs :__
https://www.rdocumentation.org/packages/dada2/versions/1.0.3/topics/mergePairs
*Merge the forwards & reverses together
*maxMismatch : how many mismatch do you accept in the overlapping region?
```{r}
mergers <- mergePairs(dadaFs, derepFs, dadaRs, derepRs, maxMismatch= 0, verbose=TRUE)
```
## <span style="color: steelblue;">Make OTU table: dada2 format with sequence as identifier!</span>
```{r}
seqtab <- makeSequenceTable(mergers)
```
## <span style="color: steelblue;">X- remove Chimera</span>
__Function: removeBimeraDenovo__
https://www.rdocumentation.org/packages/dada2/versions/1.0.3/topics/removeBimeraDenovo
```{r}
seqtab.nochim <- removeBimeraDenovo(seqtab, method="consensus", multithread=TRUE, verbose=TRUE)
```
## <span style="color: steelblue;">XI – Summarize the pre-processing</span>
```{r}
getN <- function(x) sum(getUniques(x))
track <- cbind(out, sapply(dadaFs, getN), sapply(dadaRs, getN),sapply(mergers, getN),rowSums(seqtab.nochim),rowSums(seqtab.nochim)/out[,1]*100)
colnames(track) <- c("input", "filtered", "denoisedF", "denoisedR", "merged", "nonchim","%retained")
rownames(track) <- sample.names
track
```
## <span style="color: steelblue;">XII- Taxonomic Assignment using Dada2</span>
__<span style="color: steelblue;">Function “assignTaxonomy”</span>__: __Assignment from Phylum to Genus level__
__<span style="color: steelblue;">Function “addSpecies”</span>__: __Assignment to the species level if possible (100% identity).__
```{r}
taxa <- assignTaxonomy(seqtab.nochim, "silva_nr_v132_train_set.fa", taxLevels= c("Kingdom", "Phylum","Class", "Order", "Family","Genus", "Species"), multithread=TRUE, minBoot=60)
taxa <- addSpecies(taxa, "silva_species_assignment_v132.fa.gz",allowMultiple=FALSE)
```
__<span style="color: steelblue;">FCaution: when using silva_nr_v132_train_set.fa.gz file, you MUST be located in the directory which contains this file (set working directory).</span>__
## <span style="color: steelblue;">XIII- Phylogenetic Tree</span>
NB: This step can be skipped if you have some problems to perfom it or to many ASVs.
#### __Get the ASV sequences (equivalent to representative OTUs)__
```{r}
seqs<-getSequences(seqtab.nochim)
```
#### __Alignment by DECIPHER__
```{r}
aln <- AlignSeqs(DNAStringSet(seqs), anchor=NA)
#See alignment
BrowseSeqs(aln, highlight=0)
```
#### __Transformed by Phydat__
```{r}
phang.align <- phyDat(as(aln, "matrix"), type="DNA")
```
#### __#Distance Matrix__
```{r}
dm <- dist.ml(phang.align)
treeNJ <- NJ(dm)
```
#### __Change label (leaves) of the tree (important)__
```{r}
treeNJ$tip.label<-rownames(taxa)
```
## <span style="color: steelblue;">XIV- Build the Phyloseq Object including the phylogenetic tree</span>
#### __Mapfile__
Must contain sample names identically as you defined it at the beginning of the TP
```{r}
#Check it, if necessary:
names(filtFs)
```
You need to have correspondence between Mapfile SampleID and name(filtFs). Adapt your Mapfile file if it’s not the case.
#### __Import the mapfile in Rstudio__
```{r}
MAP= "mapfileFA.txt"
MAPFILE <-import_qiime_sample_data(MAP)
```
#### __create the phyloseq object (OTU_table, Tax_table, tree)__
```{r}
Final<- phyloseq(otu_table(seqtab.nochim, taxa_are_rows=FALSE) ,sample_data(MAPFILE),tax_table(taxa),treeNJ)
```
## <span style="color: steelblue;">XV- Add the ASV sequences to your Phyloseq object</span>
```{r}
dna <- DNAStringSet(taxa_names(Final))
names(dna) <- taxa_names(Final)
```
#### __Add the Class sequence (refseq) to the Object Final1__
```{r}
Final1 <- merge_phyloseq(Final, dna)
```
#### __see__
```{r}
Final1
```
#### __Change name in ASV__
```{r}
taxa_names(Final1)
```
#### __The name of ASV is the sequence!!! You need to change this to have ASV1, ASV2 etc_
```{r}
taxa_names(Final1) <- paste0("ASV", seq(ntaxa(Final1)))
#see
taxa_names(Final1)
```
#### __See your final object for the next analyses__
```{r}
Final1
```
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment