[BioC] Query regarding SomatiSignature bioconductor package

Julian Gehring julian.gehring at embl.de
Wed Jun 11 05:30:51 CEST 2014

Hi Anand,

> 1.I have data from a single study (AML) with mutations obtained from 14 patients. In this case, how do I group the data ? If I group the data by ‘study’ as in vignette, I am getting an error while running nmfSignatures function.(I guess it’s because the dimension of matrix
> (sca_occurance) has only one column corresponding to the single study performed ) Can I group it based on patients (sampleNames) instead ?

You can group your variants by any variable that is present in the 
'VRanges' object that contain your calls.  The object behaves very 
similar to a data frame, so you could add a column with

    x$sample = ... ## your 14 samples ##

and than group it with

    motifMatrix(x, group = "sample")

If your samples are already stored in the column 'sampleNames', you can 
also refer to this (see '?mutationContext' for an example).

> 2.How do I choose the number R (number of signatures to obtain) ? I guess it should be less than number of columns of sca_occurances ? In a recent publication (Nicocolo Bolli et al , 2013, nat. com)  involving single study (multiple myeloma with 52 patients), they mention - the have found two signatures, does it mean they have set the number of signatures (R argument in nmfSignatures()) to 2?

For estimating the number of signatures, there are several approaches. 
If and how well they perform depends largely on the input data, none of 
them will work reliably in all cases.   For this reason, I haven't 
implemented an estimation for the number of signatures so far - I want 
to avoid giving a false sense of security/certainty.

For the practical aspect, most information will the contained in the 
first few signatures - increasing the number of signatures further will 
add little information.  From a biological point of view, each signature 
should result from a different mutation generating process.  In your 
setting with 14 patients suffering from the same type of cancer, one 
would suspect a low number of such processes.

I hope this made things a bit clearer.

Best wishes

More information about the Bioconductor mailing list