[BioC] Help on PLGEM R Package Data Import

Fri Sep 23 13:48:27 CEST 2011

Dear Qi,

As I suggested you already, please direct your questions to the
mailing list and I will be happy to reply to you there. You are
welcome to put me in CC for a more rapid response.

The parameter 'delta' is related to the false positive rate. You can
get a more comprehensive answer in the original paper about PLGEM:
http://www.biomedcentral.com/1471-2105/5/203

The short answer is: Yes, for proteomics dataset in which you only
survey a few hundred proteins at most, you can increase the false
positive rate ('delta') up to the level that you feel comfortable. For
example, if you only have 500 proteins in a MudPIT dataset, then
choosing a 'delta'=0.01 will result in roughly 5 false positive
identifications. If you run such an analysis and you find 50 proteins
differentially expressed, than your FDR will be roughly 10%.

Please note that proteomics dataset also have a few other special
features compared to microarrays, the most important of which is the
presence of many missing observations. I highly recommend using the
parameter trimAllZeroRows=TRUE and zeroMeanOrSD="trim" for proteomics
data, in order to improve the fitting of the model in case of many
missing values. Out of curiosity what values do you get for the
fitting (slope, r.squared, etc.)?

Hope this helps!
Norman

On Fri, Sep 23, 2011 at 3:16 PM, Wu Qi <qwu at dicp.ac.cn> wrote:
> Dear Norman,
>
> Thanks for your kind help, it worked just fine, and I have subscribed to the
> Bioconductor mailing list.
> I have two other questions:
> I found that if I use the default delta=0.001 I would get no significantly
> changed proteins, Does delta=0.001 means p=0.001? So perhaps for proteomics
> data, shall I choose a bigger delta like 0.01 or 0.05?
> And how can I evaluate the FDR of the data?
>
> Best Regards,
> Qi Wu
>
> -----Original Message-----
> From: Norman Pavelka [mailto:normanpavelka at gmail.com]
> Sent: Thursday, September 22, 2011 10:29 AM
> To: Wu Qi
> Cc: bioconductor at stat.math.ethz.ch; mattia.pelizzola at gmail.com
> Subject: Re: Help on PLGEM R Package Data Import
>
> Dear Qi,
>
> Thank you for your interest in PLGEM!
>
> I am CC'ing my reply to the Bioconductor mailing list, as this is the best
> forum to address your question. I strongly recommend you to subscribe and
> send further queries there. (You may always CC me to get a more rapid
> response.)
>
> Your question is about how to load your data into R/Bioconductor.
> Since the object that PLGEM needs as an input is of type 'ExpressionSet',
> you'll have to learn how to build such an object in R. Doing it from scratch
> is a bit cumbersome, but you could use function 'readExpressionSet' from
> package Biobase to make your life easier. Type the following in your R
> prompt to get the help page:
>
> library(Biobase)
> ?readExpressionSet
>
> For PLGEM, you will only need a single 'exprsFile' and a single
> 'phenoDataFile':
>
> * The 'exprsFile' is going to be a tab-delimited text file in which the
> first column contains your protein identifiers and the subsequent columns
> contain NSAF values from the various MS runs you performed. Be sure to put a
> meaningful header on top of each column (except for the first column). Do
> not use any spaces or special characters in your column headers, though,
> because it will cause some problems. For those proteins that were not
> identified in all your runs, replace the missing values with a zero.
>
> * The 'phenoDataFile' instead is going to be a description of your columns
> in your 'exprsFile', i.e. a description of your experimental design. Note
> that the row names of the 'phenoDataFile' need to exactly match the column
> names of the 'exprsFile'.
>
> To make it easier, I'm attaching an example with some random numbers.
> Copy these two files into your working directory and run the following
> code:
>
> library(plgem)
> eset <- readExpressionSet("example-exprsFile.txt",
> "example-phenoDataFile.txt") plgemResult <- run.plgem(eset)
>
> (Of course the results are going to look aweful, because I just put in some
> random numbers...) Please direct further queries directly to the
> Bioconductor mailing list. Good luck and let me know how it worked!
>
> Cheers,
> Norman
>
> On Wed, Sep 21, 2011 at 10:37 AM, Wu Qi <qwu at dicp.ac.cn> wrote:
>> Dear Norman,
>>
>>
>>
>> My name is Qi Wu, I'm a Chinese student working on quantitative
>> proteomics, recently your PLGEM algorithm interested me. It seems a
>> better choice than conventional t test.
>>
>> I'm a beginner in statistics, after installing PLGEM R package, I
>> followed the instruction on "An introduction to PLGEM, Mattia
>> Pelizzola and Norman Pavelka, April 13, 2011" running the wrapper mode
>> and got the sample figures. But I don't know how to import my own
>> data. I couldn't open the sample data named "LPSeset" using Excel or
>> UltraEdit, so I had no idea how the data was organized. Now I could
>> generate replicate Excel or plain text files containing proteins
>> abundance values of different status, could you tell me how can I
>> import such data in PLGEM R package and get a list containing those
>> significantly changed proteins? I searched the internet for quite a long
> time and got nothing.
>>
>> Thanks very much for contributing your wonderful algorithm, your reply
>> is high appreciated.
>>
>>
>>
>> Best Regards,
>>
>> Qi Wu
>>
>>
>
>