[BioC] Help on PLGEM R Package Usage

Norman Pavelka normanpavelka at gmail.com
Tue Sep 27 06:24:33 CEST 2011


Hi,

Thanks for sending the file. This explains everything. The raw
spectral counts are very tiny and only span one order of magnitude.
This suggests that something did not work well on the mass-spec side.
We normally see SC ranging from a few single spectra up to several
hundreds (or even thousands of spectra for the most highly abundant
proteins). In your dataset, because of the low dynamic range of the
SC, the NSAF values have a low dynamic range too, and there is not
enough information in order to capture the power-law relationship
between the standard deviation and the mean. I'm afraid you cannot use
this dataset with PLGEM. To be honest, I think you cannot use this
dataset for anything, because the information content is just so poor.
But this is just my opinion.

I suggest you to look for a better dataset where you know that PLGEM
will fit well. You are welcome to use our data from this publication:
http://www.ncbi.nlm.nih.gov/pubmed/20962780
Supplementary Data 1 contains the full data with SC, coverage, NSAF
etc. There is also the hash to download the raw MS data from Tranche
if you want to use a different method to extract the SC from the MS
files.

Finally, my comment about the strategy of looking at how many DEG were
identified as a criterion to judge two different data analysis
methods, still holds also for the comparison of two normalization
methods or two data-summarization methods etc. Unless you have a
benchmark dataset where you know what should be called a DEG and what
not, you cannot say that one method is better than another because it
selects more DEG.

Good luck!
Norman

On Tue, Sep 27, 2011 at 10:02 AM, Wu Qi <qwu at dicp.ac.cn> wrote:
> Dear Norman,
>
> You are right, the raw SC is rather small. The attachment is SC data from
> one run.
> Besides, I'm sorry to make you have the delusion that I'm comparing PLGEM
> with t test, I never meant to do so. I'm trying to compare different NSAF
> dataset handling protocols using PLGEM as a benchmark.
> Hope you have a good day. Thanks very much for your help.
>
> Regards,
> Qi Wu
>
> -----Original Message-----
> From: Norman Pavelka [mailto:normanpavelka at gmail.com]
> Sent: Monday, September 26, 2011 2:36 PM
> To: Wu Qi
> Cc: bioconductor at r-project.org
> Subject: Re: Help on PLGEM R Package Usage
>
> Hi Qi,
>
> If the model does not fit the data, there is no justification to use the
> model, hence results cannot be trusted. I wonder why this is happening,
> though, as this is the first time I see it. Could you please look at the raw
> spectral count data of this dataset? I suspect that the runs only returned a
> few spectra per protein. This would explain the low dynamic range of the
> NSAF values and the bad fit of the PLGEM.
>
> On a separate note, I'm not sure I agree in your strategy "to illustrate one
> method outperforms another because of its larger DEG list". Are you
> referring to DEG identification methods (e.g. t-test vs. plgem)? In that
> case, a larger number of identified DEG does not necessarily mean a better
> method. The DEG selection method could be selecting more false positives. A
> better way to compare two methods is against a benchmark dataset for which
> the true positives are known, and comparing the false positive rate and
> false negative rate by means e.g. of ROC curves.
>
> HTH,
> Norman
>
> On Sun, Sep 25, 2011 at 7:48 PM, Wu Qi <qwu at dicp.ac.cn> wrote:
>> Dear Norman,
>>
>> If the parameters(slope, r^2 and Pearson correlation coefficients )
>> look terrible, does this mean the DEG list I got cannot be trusted?
>> So can I compare two DEG lists with very different parameters? My
>> point is to illustrate one method outperforms another because of its
>> larger DEG list, but the parameters of  these two datasets vary a lot.
>> Thanks for your help.
>>
>> Regards,
>> Qi Wu
>>
>> -----Original Message-----
>> From: Norman Pavelka [mailto:normanpavelka at gmail.com]
>> Sent: Saturday, September 24, 2011 11:39 PM
>> To: Wu Qi
>> Cc: bioconductor at r-project.org
>> Subject: Re: Help on PLGEM R Package Usage
>>
>> You will have to set plotFile=FALSE if you want to override the
>> default png file.
>>
>> Also, given the relatively small dataset you are using (~500
>> proteins), I recommend increasing the number of iterations of the
>> permutation step. The default Iterations="automatic" only uses 500
> iterations in your case.
>> However I would suggest setting it to at least 1000 or even more. This
>> will make p-values more stable from run to run. I don't know if you
>> noticed, but each time you run PLGEM you get slightly different
>> p-values. This is because the permutation step is based on random
>> resampling of your data and could be different from run to run. Using
>> a larger number of iterations stabilizes the empirical distribution of
>> resampled STN ratios, and makes p-values more stable.
>>
>> That said, if your data do not fit well to the PLGEM, then there is
>> little chance you can improve the results by tweaking these other
> parameters.
>>
>> Hope this helps!
>> Norman
>>
>> On Sat, Sep 24, 2011 at 4:19 PM, Wu Qi <qwu at dicp.ac.cn> wrote:
>>> Dear Norman,
>>>
>>> The dataset is downloaded from Tranche website
>>> https://proteomecommons.org/dataset.jsp?!=73694 . I haven't gone
>>> through the experimental details yet.
>>> When I try to produce high quality figures following your
>>> instructions, I get a plot whose parameters are quite different using
>>> following commands, I guess this plot is generated with default
> arguments:
>>>
>>> NSAFSet<-readExpressionSet("exprs_NSAF.txt","phenoDataFile.txt")
>>> pdf()
>>> NSAFdegList<-run.plgem(NSAFSet, signLev=0.01, rank=100, covariate=1,
>>> baselineCondition="E", Iterations="automatic", trimAllZeroRows=TRUE,
>>> zeroMeanOrSD="trim", fitting.eval=TRUE, plotFile=TRUE,
>>> writeFiles=FALSE,
>>> Verbose=TRUE)
>>> dev.off()
>>>
>>> By these commands, I could still only get a fittingEval.png which is
>>> very small. How can I write fittingEval plot generated with my own
>>> arguments to other file formats?
>>>
>>>
>>> -----Original Message-----
>>> From: Norman Pavelka [mailto:normanpavelka at gmail.com]
>>> Sent: Saturday, September 24, 2011 1:23 AM
>>> To: Wu Qi
>>> Cc: bioconductor at r-project.org
>>> Subject: Re: Help on PLGEM R Package Usage
>>>
>>> Dear Qi,
>>>
>>> Thank you for the data and the plots. I think the problem might
>>> reside in your data. If you do a boxplot of your data you will notice
>>> that they do not span many orders of magnitude. Here's how you can
>>> see for
>>> yourself:
>>>
>>> test <- log10(exprs(NSAFSet))  # log-transform your data test[test ==
>>> -Inf] <- NA     # to remove -Inf values coming from log10(0)
>>> boxplot(test)
>>>
>>> PLGEM fits best when data span several orders of magnitude, whereas
>>> in your case the NSAF values only span two orders of magnitude. May I
>>> ask you which proteomics technology you used to generate these data?
>>> Is this a whole-cell extract or a subproteome?
>>>
>>> Cheers,
>>> Norman
>>>
>>> On Sat, Sep 24, 2011 at 12:02 AM, Wu Qi <qwu at dicp.ac.cn> wrote:
>>>> Dear Norman,
>>>>
>>>> Thanks for your quick response, please find my attached files and plot.
>>>> I really don't understand how to optimize the arguments for every
>>>> step and I have more than one dataset which also need evaluation. So
>>>> could you possibly give me some advice on choosing arguments?
>>>> The commands for generating this plot is as follows:
>>>>
>>>> library(plgem)
>>>>
>>>> NSAFSet<-readExpressionSet("exprs_NSAF.txt","phenoDataFile.txt")
>>>>
>>>> NSAFdegList<-run.plgem(NSAFSet, signLev=0.01, rank=100, covariate=1,
>>>> baselineCondition="E", Iterations="automatic", trimAllZeroRows=TRUE,
>>>> zeroMeanOrSD="trim", fitting.eval=TRUE, plotFile=TRUE,
>>>> writeFiles=FALSE,
>>>> Verbose=TRUE)
>>>>
>>>> plgem.write.summary(NSAFdegList, prefix="NSAF", verbose=TRUE)
>>>>
>>>> Kind Regards,
>>>> Qi Wu
>>>>
>>>> -----Original Message-----
>>>> From: Norman Pavelka [mailto:normanpavelka at gmail.com]
>>>> Sent: Friday, September 23, 2011 11:38 PM
>>>> To: Wu Qi
>>>> Cc: bioconductor at r-project.org
>>>> Subject: Re: Help on PLGEM R Package Usage
>>>>
>>>> Hi Qi,
>>>>
>>>> These fitting values look very outside the optimal range. Do you
>>>> actually get a straight line in the ln(sd) vs. ln(mean) plot? If
>>>> not, something might be wrong about how the data were normalized.
>>>> You may e-mail me offline your data and/or the fitting evaluation
>>>> plots and I might be able to diagnose the problem.
>>>>
>>>> The slope is one of the most important parameters to look at, and it
>>>> usually should be between 0.5 and 1. The r^2 and Pearson correlation
>>>> coefficients should be as close to 1 as possible.
>>>>
>>>> In order to capture the plots in another file format you can call
>>>> pdf() prior to run.plgem() to generate a high-quality
>>>> vector-graphics PDF file. Example:
>>>>
>>>> library(plgem)
>>>> data(LPSeset)
>>>> pdf()      # this will open a new PDF file called 'Rplots.pdf'
>>>>           # in your current working directory plgemOutput <-
>>>> run.plgem(LPSeset)
>>>> dev.off()  # this will close the PDF file
>>>>
>>>> Instead of pdf() above you can try bmp(), jpeg(), tiff() or
>>>> virtually any other major image file format. Under Windows there is
>>>> also
>>>> win.metafile() that generates EMF image file format.
>>>>
>>>> Hope this helps!
>>>> Norman
>>>>
>>>> On Fri, Sep 23, 2011 at 11:06 PM, Wu Qi <qwu at dicp.ac.cn> wrote:
>>>>> Dear Norman,
>>>>>
>>>>>
>>>>>
>>>>> Thanks for your further advice.
>>>>>
>>>>> After applying the arguements you recommend, The parameters for my
>>>>> NSAF dataset are: slope=0.291, intercept=-5.35, adj.r2=0.636,
>>>>> Pearson=0.464. Are they horrible?
>>>>>
>>>>> Could you tell me which is the most important parameter to assess
>>>>> my dataset quality?
>>>>>
>>>>> And how can I export high quality figure (emf format) with these
>>>> parameters?
>>>>> I could only find it in the simplest wrapper mode. When I append
>>>>> "plotFile=TRUE" in run.plgem function, I could only get a png
>>>>> figure whose resolution is really poor.
>>>>>
>>>>>
>>>>>
>>>>> Best Regards,
>>>>>
>>>>> Qi Wu
>>>>
>>>
>>>
>>
>>
>



More information about the Bioconductor mailing list