[BioC] MCS & ChemineR

Fri Mar 15 22:35:37 CET 2013

Dear David,

Below is some code that addresses most of your needs as I understand
them, especially under (B). Please note, the alternative solutions an
MCS/FMCS search can return are alternative optimal solutions of
identical size, and not suboptimal solutions of variable size. All those
solutions are stored in the MCS object class and their structures
(SDFset connection tables) can be returned with the mcs2sdfset function.
The usage of the MCS class is explained in section 5.3 of the fmcsR
vignette. 

An important consideration of your approach is the number of compounds
you are trying to analyze. It should work fine if you deal with a few
hundred to a few thousand compounds but it would be computationally very
challenging if it were many thousands compounds, which is mainly due to
the computational complexity of the MCS/FMCS search process (NP-complete
problem).

In addition, before you construct your network you may want think 
about a reasonable way to reduce the redundancy in the MCS/FMCS 
results which will be quite pronounced. Clustering based on 
atom pair fingerprint or MCS similarity may be an option as shown
under (B2).

Best,

Thomas

library(ChemmineR)
library(fmcsR)
## Import sample compound set provided by ChemmineR library
## To import your own compounds use the read.SDFset function
data(sdfsample)
sdf <- sdfsample[1:6] # Sample set with 6 compound only 
## (A.1) Compute similarity matrix with 
sim <- sapply(cid(sdf), function(x) fmcsBatch(sdf[x], sdf, au=0, bu=0)[,"Tanimoto_Coefficient"])
sim
          CMP1      CMP2      CMP3      CMP4      CMP5      CMP6
CMP1 1.0000000 0.2291667 0.2040816 0.1607143 0.3333333 0.3333333
CMP2 0.2291667 1.0000000 0.3000000 0.3181818 0.2564103 0.2500000
CMP3 0.2040816 0.3000000 1.0000000 0.1836735 0.2250000 0.3636364
CMP4 0.1607143 0.3181818 0.1836735 1.0000000 0.1956522 0.2439024
CMP5 0.3333333 0.2564103 0.2250000 0.1956522 1.0000000 0.3548387
CMP6 0.3333333 0.2500000 0.3636364 0.2439024 0.3548387 1.0000000

## (A.2) Hierarchical clustering with MCS scores as similarity/distance matrix
d <- 1-sim
hc <- hclust(as.dist(1-d), method="complete")
plot(as.dendrogram(hc), edgePar=list(col=4, lwd=2), horiz=TRUE)

## (B.1) Function to obtain MCS/FMCS for all compound pairs in a library
allpairMCS <- function(sdf) {
	sdfset <- SDFset()
	for(i in cid(sdf)){
		for(j in cid(sdf)){
			f <- mcs2sdfset(fmcs(sdf[j], sdf[i]))[[1]]
			cid(f) <- paste(i, j, gsub("^.*_fmcs", "fmcs", cid(f)), sep="_")
			sdfset <- c(sdfset, f)
		}
	}
	return(sdfset)
}
allmcs <- allpairMCS(sdf=sdf)
cid(allmcs)[1:12]

 [1] "CMP1_CMP1_fmcs_1" "CMP1_CMP2_fmcs_1" "CMP1_CMP3_fmcs_1" "CMP1_CMP3_fmcs_2"
 [5] "CMP1_CMP3_fmcs_3" "CMP1_CMP4_fmcs_1" "CMP1_CMP4_fmcs_2" "CMP1_CMP4_fmcs_3"
 [9] "CMP1_CMP4_fmcs_4" "CMP1_CMP4_fmcs_5" "CMP1_CMP4_fmcs_6" "CMP1_CMP4_fmcs_7"

## (B.2) Cluster MCS/FMCS result to identify/eliminate identical or very similar ones
## This could be done again using the FMCS algorithm or atom pair fingerprints
sim <- sapply(cid(allmcs), function(x) fmcsBatch(allmcs[x], allmcs, au=0, bu=0)[,"Tanimoto_Coefficient"])
as.dist(sim[1:4,1:4])

                  MP1_CMP1_fmcs_1 CMP1_CMP2_fmcs_1 CMP1_CMP3_fmcs_1
CMP1_CMP2_fmcs_1        0.3333333                                  
CMP1_CMP3_fmcs_1        0.3030303        0.4000000                 
CMP1_CMP3_fmcs_2        0.3030303        0.4000000        1.0000000

On Fri, Mar 15, 2013 at 03:14:29PM +0000, David Covell wrote:
> Dear Thomas,
> 
> As a follow-up to my previous e-mail, I would like to ask your advice about
> using ChemmineR to complete this task. The goal is to generate all-to-all
> MCS that include all alternative MCS's for a set of compounds. In order to do
> this correctly, I have attached a test set of 50 compounds.
> 
> First, from the ChemmineR  Manual, I believe the all-to-all MCS 
> calculation should
> have a form similar to that used for calculating a distance matrix
> 
>  > dist <- sapply(cid(sdf), function(x)
> + fmcsBatch(sdf[x], sdf, au=0, bu=0)[,"Overlap_Coefficient"])
> 
> However, I am not able to determine the proper syntax for this 
> calculation. Is there
> a compact way to do this?
> 
> Second, I am not sure how to get the alternative MCS solutions. Can you
> provide some suggestions. I believe the best MCS is often selected, however,
> all alterantives would be needed here.
> 
> Note that my goal is to generate a map that associates all MCSs to each test
> compound. These associations will be used to generate a network interaction
> map with the edges weighted according to compound bioactivity in a molecular
> screen.
> 
> Thanks for your help,
> Sincerely,
> David
> 
>   
>