[BioC] Help on PLGEM R Package Usage

Norman Pavelka normanpavelka at gmail.com
Mon Sep 26 08:35:47 CEST 2011


Hi Qi,

If the model does not fit the data, there is no justification to use
the model, hence results cannot be trusted. I wonder why this is
happening, though, as this is the first time I see it. Could you
please look at the raw spectral count data of this dataset? I suspect
that the runs only returned a few spectra per protein. This would
explain the low dynamic range of the NSAF values and the bad fit of
the PLGEM.

On a separate note, I'm not sure I agree in your strategy "to
illustrate one method outperforms another because of its larger DEG
list". Are you referring to DEG identification methods (e.g. t-test
vs. plgem)? In that case, a larger number of identified DEG does not
necessarily mean a better method. The DEG selection method could be
selecting more false positives. A better way to compare two methods is
against a benchmark dataset for which the true positives are known,
and comparing the false positive rate and false negative rate by means
e.g. of ROC curves.

HTH,
Norman

On Sun, Sep 25, 2011 at 7:48 PM, Wu Qi <qwu at dicp.ac.cn> wrote:
> Dear Norman,
>
> If the parameters(slope, r^2 and Pearson correlation coefficients ) look
> terrible, does this mean the DEG list I got cannot be trusted?
> So can I compare two DEG lists with very different parameters? My point is
> to illustrate one method outperforms another because of its larger DEG list,
> but the parameters of  these two datasets vary a lot.
> Thanks for your help.
>
> Regards,
> Qi Wu
>
> -----Original Message-----
> From: Norman Pavelka [mailto:normanpavelka at gmail.com]
> Sent: Saturday, September 24, 2011 11:39 PM
> To: Wu Qi
> Cc: bioconductor at r-project.org
> Subject: Re: Help on PLGEM R Package Usage
>
> You will have to set plotFile=FALSE if you want to override the default png
> file.
>
> Also, given the relatively small dataset you are using (~500 proteins), I
> recommend increasing the number of iterations of the permutation step. The
> default Iterations="automatic" only uses 500 iterations in your case.
> However I would suggest setting it to at least 1000 or even more. This will
> make p-values more stable from run to run. I don't know if you noticed, but
> each time you run PLGEM you get slightly different p-values. This is because
> the permutation step is based on random resampling of your data and could be
> different from run to run. Using a larger number of iterations stabilizes
> the empirical distribution of resampled STN ratios, and makes p-values more
> stable.
>
> That said, if your data do not fit well to the PLGEM, then there is little
> chance you can improve the results by tweaking these other parameters.
>
> Hope this helps!
> Norman
>
> On Sat, Sep 24, 2011 at 4:19 PM, Wu Qi <qwu at dicp.ac.cn> wrote:
>> Dear Norman,
>>
>> The dataset is downloaded from Tranche website
>> https://proteomecommons.org/dataset.jsp?!=73694 . I haven't gone
>> through the experimental details yet.
>> When I try to produce high quality figures following your
>> instructions, I get a plot whose parameters are quite different using
>> following commands, I guess this plot is generated with default arguments:
>>
>> NSAFSet<-readExpressionSet("exprs_NSAF.txt","phenoDataFile.txt")
>> pdf()
>> NSAFdegList<-run.plgem(NSAFSet, signLev=0.01, rank=100, covariate=1,
>> baselineCondition="E", Iterations="automatic", trimAllZeroRows=TRUE,
>> zeroMeanOrSD="trim", fitting.eval=TRUE, plotFile=TRUE,
>> writeFiles=FALSE,
>> Verbose=TRUE)
>> dev.off()
>>
>> By these commands, I could still only get a fittingEval.png which is
>> very small. How can I write fittingEval plot generated with my own
>> arguments to other file formats?
>>
>>
>> -----Original Message-----
>> From: Norman Pavelka [mailto:normanpavelka at gmail.com]
>> Sent: Saturday, September 24, 2011 1:23 AM
>> To: Wu Qi
>> Cc: bioconductor at r-project.org
>> Subject: Re: Help on PLGEM R Package Usage
>>
>> Dear Qi,
>>
>> Thank you for the data and the plots. I think the problem might reside
>> in your data. If you do a boxplot of your data you will notice that
>> they do not span many orders of magnitude. Here's how you can see for
>> yourself:
>>
>> test <- log10(exprs(NSAFSet))  # log-transform your data test[test ==
>> -Inf] <- NA     # to remove -Inf values coming from log10(0)
>> boxplot(test)
>>
>> PLGEM fits best when data span several orders of magnitude, whereas in
>> your case the NSAF values only span two orders of magnitude. May I ask
>> you which proteomics technology you used to generate these data? Is
>> this a whole-cell extract or a subproteome?
>>
>> Cheers,
>> Norman
>>
>> On Sat, Sep 24, 2011 at 12:02 AM, Wu Qi <qwu at dicp.ac.cn> wrote:
>>> Dear Norman,
>>>
>>> Thanks for your quick response, please find my attached files and plot.
>>> I really don't understand how to optimize the arguments for every
>>> step and I have more than one dataset which also need evaluation. So
>>> could you possibly give me some advice on choosing arguments?
>>> The commands for generating this plot is as follows:
>>>
>>> library(plgem)
>>>
>>> NSAFSet<-readExpressionSet("exprs_NSAF.txt","phenoDataFile.txt")
>>>
>>> NSAFdegList<-run.plgem(NSAFSet, signLev=0.01, rank=100, covariate=1,
>>> baselineCondition="E", Iterations="automatic", trimAllZeroRows=TRUE,
>>> zeroMeanOrSD="trim", fitting.eval=TRUE, plotFile=TRUE,
>>> writeFiles=FALSE,
>>> Verbose=TRUE)
>>>
>>> plgem.write.summary(NSAFdegList, prefix="NSAF", verbose=TRUE)
>>>
>>> Kind Regards,
>>> Qi Wu
>>>
>>> -----Original Message-----
>>> From: Norman Pavelka [mailto:normanpavelka at gmail.com]
>>> Sent: Friday, September 23, 2011 11:38 PM
>>> To: Wu Qi
>>> Cc: bioconductor at r-project.org
>>> Subject: Re: Help on PLGEM R Package Usage
>>>
>>> Hi Qi,
>>>
>>> These fitting values look very outside the optimal range. Do you
>>> actually get a straight line in the ln(sd) vs. ln(mean) plot? If not,
>>> something might be wrong about how the data were normalized. You may
>>> e-mail me offline your data and/or the fitting evaluation plots and I
>>> might be able to diagnose the problem.
>>>
>>> The slope is one of the most important parameters to look at, and it
>>> usually should be between 0.5 and 1. The r^2 and Pearson correlation
>>> coefficients should be as close to 1 as possible.
>>>
>>> In order to capture the plots in another file format you can call
>>> pdf() prior to run.plgem() to generate a high-quality vector-graphics
>>> PDF file. Example:
>>>
>>> library(plgem)
>>> data(LPSeset)
>>> pdf()      # this will open a new PDF file called 'Rplots.pdf'
>>>           # in your current working directory plgemOutput <-
>>> run.plgem(LPSeset)
>>> dev.off()  # this will close the PDF file
>>>
>>> Instead of pdf() above you can try bmp(), jpeg(), tiff() or virtually
>>> any other major image file format. Under Windows there is also
>>> win.metafile() that generates EMF image file format.
>>>
>>> Hope this helps!
>>> Norman
>>>
>>> On Fri, Sep 23, 2011 at 11:06 PM, Wu Qi <qwu at dicp.ac.cn> wrote:
>>>> Dear Norman,
>>>>
>>>>
>>>>
>>>> Thanks for your further advice.
>>>>
>>>> After applying the arguements you recommend, The parameters for my
>>>> NSAF dataset are: slope=0.291, intercept=-5.35, adj.r2=0.636,
>>>> Pearson=0.464. Are they horrible?
>>>>
>>>> Could you tell me which is the most important parameter to assess my
>>>> dataset quality?
>>>>
>>>> And how can I export high quality figure (emf format) with these
>>> parameters?
>>>> I could only find it in the simplest wrapper mode. When I append
>>>> "plotFile=TRUE" in run.plgem function, I could only get a png figure
>>>> whose resolution is really poor.
>>>>
>>>>
>>>>
>>>> Best Regards,
>>>>
>>>> Qi Wu
>>>
>>
>>
>
>



More information about the Bioconductor mailing list