[BioC] Package plgem for analysis of spectral counts, was 'limma for spectral counts'

Sun Oct 24 22:36:00 CEST 2010

Hi Norman,

Thank you for your eagerness to help.
I rerun the plgem wrapper and obtained the attached plot.

 > LPSdegList <- run.plgem(esdata = exampleSet, signLev=0.05,
rank=5000, trimAllZeroRows=TRUE)
Warning messages:
1: In plgem.fit(data = esdata, covariate = covariate, fitCondition =
fitCondition,  :
  PLGEM slope is lower than 0.5
2: In plgem.fit(data = esdata, covariate = covariate, fitCondition =
fitCondition,  :
  Adjusted r^2 is lower than 0.95
3: In plgem.fit(data = esdata, covariate = covariate, fitCondition =
fitCondition,  :
  Pearson correlation coefficient is lower than 0.85

I was wondering why the adjusted R--square was so small. The Pearson
correlation improved a little bit but still is less than 0.85. The
ln(mean) and ln(sd) are all negative, which means small difference
between the two conditions and small variation. The residual histogram
is positively skewed.
The data have substantial of missing values. I used na.roughfix
followed by RFImpute in Random Forest package to impute the missing
spectral counts. I think the imputation skewed the output.
Do you know any appropriate method to impute missing spectral counts
from mass spectrometry?

Yolande

On Sun, Oct 24, 2010 at 12:49 PM, Pavelka, Norman <NXP at stowers.org> wrote:
> Hi Yolande,
>
> The warning messages are telling you that the model is not fitting in the expected way to the data. From the diagnostic plots it is clear that something's not right: you can see two populations of values, one of which clustered around close-to-zero values. In proteomics data, unlike microarray data, there's often a large majority of data-points in a dataset that are either missing or zero. This causes problems in the fitting of the model, as you experienced. For analyzing proteomics data, I strongly recommend using option 'trimAllZeroRows=TRUE' to remove all rows that contain only zero values in a given condition.
>
> Also, depending on the size of your proteomics dataset (in terms of how many proteins were identified by the mass-spec), you may experience some instability in the results using the default number of iterations. Try running the plgem wrapper a few times one after the other, and if you notice that the number of selected proteins is very variable from run to run, then try increasing the number of 'Iterations' in command 'run.plgem' to 1000, 2000 or even 5000. The runs will take a bit longer, but you should get more stable results.
>
> Let me know how it works!
>
> Norman
>
> -----Original Message-----
> From: Yolande Tra [mailto:yolande.tra at gmail.com]
> Sent: Saturday, October 23, 2010 1:54 PM
> To: Pavelka, Norman
> Subject: Re: [BioC] limma for spectral counts
>
> Hi Norman,
>
> I also run the wrapper mode and obtain the attached diagnostic plots.
> There was no protein differentially expressed in the output. It is totally different from the tutorial example data set diagnostics. What do you think?
>
> LPSdegList <- run.plgem(esdata = exampleSet) Warning messages:
> 1: In plgem.fit(data = esdata, covariate = covariate, fitCondition = fitCondition,  :
>  PLGEM slope is lower than 0.5
> 2: In plgem.fit(data = esdata, covariate = covariate, fitCondition = fitCondition,  :
>  Adjusted r^2 is lower than 0.95
> 3: In plgem.fit(data = esdata, covariate = covariate, fitCondition = fitCondition,  :
>  Pearson correlation coefficient is lower than 0.85
>
> Yolande
> On Fri, Oct 22, 2010 at 7:49 PM, Pavelka, Norman <NXP at stowers.org> wrote:
>> Hi Yolande,
>>
>> You can try normalizing your specral counts following the NSAF (Normalized Spectral Abundance Factor) approach and then you can use package 'plgem' to detect your differentially abundant proteins. You can have a look at this publication to get an idea and then let me know if you need any help:
>>
>> http://www.ncbi.nlm.nih.gov/pubmed/18029349
>>
>> Thanks and good luck!
>> Norman
>>
>>
>> On 20 October 2010 14:20, Yolande Tra <yolande.tra at gmail.com> wrote:
>>> Hello list members,
>>>
>>> I was wondering if limma method can be used for spectral counts of
>>> proteins from mass spectrometry. If yes, is there a function in
>>> Bioconductor that normalizes these counts.before running limma.
>>>
>>> Thank you for your help,
>>>
>>> Yolande
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>> Norman Pavelka, Ph.D.
>> Postdoctoral Research Associate
>> Rong Li lab
>> Stowers Institute for Medical Research 1000 E. 50th St.
>> Kansas City, MO 64110
>> U.S.A.
>>
>> phone: +1 (816) 926-4103
>> fax: +1 (816) 926-4658
>> e-mail: nxp at stowers.org
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: diagnostic.png
Type: image/png
Size: 20849 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20101024/c72f9bd8/attachment.png>