[BioC] Help on PLGEM R Package Usage

Sun Sep 25 13:48:46 CEST 2011

Dear Norman,

If the parameters(slope, r^2 and Pearson correlation coefficients ) look
terrible, does this mean the DEG list I got cannot be trusted?
So can I compare two DEG lists with very different parameters? My point is
to illustrate one method outperforms another because of its larger DEG list,
but the parameters of  these two datasets vary a lot.
Thanks for your help.

Regards,
Qi Wu

-----Original Message-----
From: Norman Pavelka [mailto:normanpavelka at gmail.com] 
Sent: Saturday, September 24, 2011 11:39 PM
To: Wu Qi
Cc: bioconductor at r-project.org
Subject: Re: Help on PLGEM R Package Usage

You will have to set plotFile=FALSE if you want to override the default png
file.

Also, given the relatively small dataset you are using (~500 proteins), I
recommend increasing the number of iterations of the permutation step. The
default Iterations="automatic" only uses 500 iterations in your case.
However I would suggest setting it to at least 1000 or even more. This will
make p-values more stable from run to run. I don't know if you noticed, but
each time you run PLGEM you get slightly different p-values. This is because
the permutation step is based on random resampling of your data and could be
different from run to run. Using a larger number of iterations stabilizes
the empirical distribution of resampled STN ratios, and makes p-values more
stable.

That said, if your data do not fit well to the PLGEM, then there is little
chance you can improve the results by tweaking these other parameters.

Hope this helps!
Norman

On Sat, Sep 24, 2011 at 4:19 PM, Wu Qi <qwu at dicp.ac.cn> wrote:
> Dear Norman,
>
> The dataset is downloaded from Tranche website
> https://proteomecommons.org/dataset.jsp?!=73694 . I haven't gone 
> through the experimental details yet.
> When I try to produce high quality figures following your 
> instructions, I get a plot whose parameters are quite different using 
> following commands, I guess this plot is generated with default arguments:
>
> NSAFSet<-readExpressionSet("exprs_NSAF.txt","phenoDataFile.txt")
> pdf()
> NSAFdegList<-run.plgem(NSAFSet, signLev=0.01, rank=100, covariate=1, 
> baselineCondition="E", Iterations="automatic", trimAllZeroRows=TRUE, 
> zeroMeanOrSD="trim", fitting.eval=TRUE, plotFile=TRUE, 
> writeFiles=FALSE,
> Verbose=TRUE)
> dev.off()
>
> By these commands, I could still only get a fittingEval.png which is 
> very small. How can I write fittingEval plot generated with my own 
> arguments to other file formats?
>
>
> -----Original Message-----
> From: Norman Pavelka [mailto:normanpavelka at gmail.com]
> Sent: Saturday, September 24, 2011 1:23 AM
> To: Wu Qi
> Cc: bioconductor at r-project.org
> Subject: Re: Help on PLGEM R Package Usage
>
> Dear Qi,
>
> Thank you for the data and the plots. I think the problem might reside 
> in your data. If you do a boxplot of your data you will notice that 
> they do not span many orders of magnitude. Here's how you can see for
> yourself:
>
> test <- log10(exprs(NSAFSet))  # log-transform your data test[test == 
> -Inf] <- NA     # to remove -Inf values coming from log10(0)
> boxplot(test)
>
> PLGEM fits best when data span several orders of magnitude, whereas in 
> your case the NSAF values only span two orders of magnitude. May I ask 
> you which proteomics technology you used to generate these data? Is 
> this a whole-cell extract or a subproteome?
>
> Cheers,
> Norman
>
> On Sat, Sep 24, 2011 at 12:02 AM, Wu Qi <qwu at dicp.ac.cn> wrote:
>> Dear Norman,
>>
>> Thanks for your quick response, please find my attached files and plot.
>> I really don't understand how to optimize the arguments for every 
>> step and I have more than one dataset which also need evaluation. So 
>> could you possibly give me some advice on choosing arguments?
>> The commands for generating this plot is as follows:
>>
>> library(plgem)
>>
>> NSAFSet<-readExpressionSet("exprs_NSAF.txt","phenoDataFile.txt")
>>
>> NSAFdegList<-run.plgem(NSAFSet, signLev=0.01, rank=100, covariate=1, 
>> baselineCondition="E", Iterations="automatic", trimAllZeroRows=TRUE, 
>> zeroMeanOrSD="trim", fitting.eval=TRUE, plotFile=TRUE, 
>> writeFiles=FALSE,
>> Verbose=TRUE)
>>
>> plgem.write.summary(NSAFdegList, prefix="NSAF", verbose=TRUE)
>>
>> Kind Regards,
>> Qi Wu
>>
>> -----Original Message-----
>> From: Norman Pavelka [mailto:normanpavelka at gmail.com]
>> Sent: Friday, September 23, 2011 11:38 PM
>> To: Wu Qi
>> Cc: bioconductor at r-project.org
>> Subject: Re: Help on PLGEM R Package Usage
>>
>> Hi Qi,
>>
>> These fitting values look very outside the optimal range. Do you 
>> actually get a straight line in the ln(sd) vs. ln(mean) plot? If not, 
>> something might be wrong about how the data were normalized. You may 
>> e-mail me offline your data and/or the fitting evaluation plots and I 
>> might be able to diagnose the problem.
>>
>> The slope is one of the most important parameters to look at, and it 
>> usually should be between 0.5 and 1. The r^2 and Pearson correlation 
>> coefficients should be as close to 1 as possible.
>>
>> In order to capture the plots in another file format you can call
>> pdf() prior to run.plgem() to generate a high-quality vector-graphics 
>> PDF file. Example:
>>
>> library(plgem)
>> data(LPSeset)
>> pdf()      # this will open a new PDF file called 'Rplots.pdf'
>>           # in your current working directory plgemOutput <-
>> run.plgem(LPSeset)
>> dev.off()  # this will close the PDF file
>>
>> Instead of pdf() above you can try bmp(), jpeg(), tiff() or virtually 
>> any other major image file format. Under Windows there is also
>> win.metafile() that generates EMF image file format.
>>
>> Hope this helps!
>> Norman
>>
>> On Fri, Sep 23, 2011 at 11:06 PM, Wu Qi <qwu at dicp.ac.cn> wrote:
>>> Dear Norman,
>>>
>>>
>>>
>>> Thanks for your further advice.
>>>
>>> After applying the arguements you recommend, The parameters for my 
>>> NSAF dataset are: slope=0.291, intercept=-5.35, adj.r2=0.636, 
>>> Pearson=0.464. Are they horrible?
>>>
>>> Could you tell me which is the most important parameter to assess my 
>>> dataset quality?
>>>
>>> And how can I export high quality figure (emf format) with these
>> parameters?
>>> I could only find it in the simplest wrapper mode. When I append 
>>> "plotFile=TRUE" in run.plgem function, I could only get a png figure 
>>> whose resolution is really poor.
>>>
>>>
>>>
>>> Best Regards,
>>>
>>> Qi Wu
>>
>
>