[R] Average of x normalizations
Guillaume Tahon
Guillaume.Tahon at UGent.be
Sat Nov 19 12:50:27 CET 2016
Dear all,
I am currently using R to normalize data (an OTU table). This all happens by using the Vegan package. An example of the script that I'm using is posted below.
What happens is the following:
I first read a text file in which the OTU table is stored. In this table every column represents an OTU, whereas rows represent the samples (see example below). The data from this table is pasted to a txt file from an excel data sheet.
Afterwards, I want to normaize this table, i.e. rarefy it to the sample with the least number of sequences. Since there can be a lot of variation between normalizations, I chose to perform a certain number of iterations (100 in the example below).
The rarefied OTU_TABLE is stored in OTU_sub. This table is identical to the original table (same number of rows and columns, same header), but the numbers will vary (for example, instead of 25 in the Sample1/OTU1 field, this will have changed to 14).
The loop performs the iterations and rarefies the data 100 times. Using these results, a boxplot is generated that gives an overview of the results of all the iterations (in this case the recovered richness/number of OTUs left after normalization).
This works fine, but I would like to step up my game.
What I would like to do is perform x iterations. However, Out of all the iterations, I want to make a final OTU_sub table. This table contains, for each field, the average of all the iterations performed.
Suppose I do 3 iterations and for Sample1/OTU1 I get the numbers 14, 5 and 20 in the different iterations, the final table should give me (14+5+20)/3 = 13.
The same thing should be done for all the other fields. Obviously, nothing should change to the fields belonging to the sample with the least number of reads.
If possible, I would also like to get the standard deviation for each field, preferably in a seperate table. This would allow me to thoroughly analyze the normalization data.
If anyone would be able to assist me with this, I would be very grateful!
Sincerely,
Guillaume Tahon
Example OTU_TABLE:
? OTU1? OTU2? OTU3?
?Sample1 ?25 ?5 0?
?Sample2 ?1 ?18 ?2
?Sample3 ?0 ?1 ?20
?Sample4 ?185 4? ?25
Currently used script:
library(vegan)
OTU_TABLE<-read.table(file="OTU_TABLE.txt", header=T)
N=100 # No. of iterations
richness_mat <- matrix(nrow = N, ncol = 4) # 4 Samples
colnames(richness_mat)<-c("sample1","sample2", "sample3", "sample4")
for(i in 1:N){
OTU_sub <- rrarefy(OTU_TABLE, min(rowSums(OTU_TABLE))) # rarefy to the sample with the least number of sequences
OTU_vector <- rowSums(OTU_sub > 0) # Count the number of OTUs per sample
richness_mat[i,1] <- OTU_vector[1]
richness_mat[i,2] <- OTU_vector[2]
richness_mat[i,3] <- OTU_vector[3]
richness_mat[i,4] <- OTU_vector[4]
}
par(mar = c(3,4,2,2))
boxplot(richness_mat, ylab = "number of OTUs")
paste(round(colMeans(as.matrix(richness_mat)),2),"±",
round(apply(richness_mat, 2, sd),2))
[[alternative HTML version deleted]]
More information about the R-help
mailing list