[BioC] Tigre Package question 3

Fri Feb 14 15:08:49 CET 2014

Dear Anisha,

Sorry for the delayed response.

There is a different method, but it is more complicated and 
example-specific. I will try to work on making the package more RNA-seq 
friendly for the next release.

I understand your specific problem was already solved off-list, but 
others facing similar problems please get in touch.

Antti

On 2014-02-11 14:30 , Solanki, Anisha wrote:
> Dear Antti,
>
> Thanks for your reply. I would like you to know that the Data I have
> doesn't have any replicates its just a time series with 7 samples.
>
> Is there a different method of calculating the variances?
>
> Please advise.
>
> Thanks
>
> Anisha
>
>
>
>
> On 11/02/2014 10:32, "Antti Honkela" <antti.honkela at hiit.fi> wrote:
>
>> Dear Anisha,
>>
>> (FYI: taking the discussion back to the list.)
>>
>> mmgmos is only meant for processing Affymetrix microarray data so it
>> cannot be used with RNA-seq data.
>>
>> It is possible to obtain similar error bars for expression estimates as
>> mmgmos provides from RNA-seq using suitable analysis tools such as
>> BitSeq, but unfortunately using those in tigre does not currently work
>> out of the box. If you need this, e.g. if you have data with very few or
>> no replicates of your time series, please let me know so we can try to
>> work it out.
>>
>> If you have sufficiently many replicate time series, you should be OK
>> without any pre-specified variances - the model will fit a simple
>> variance model in this case. To trigger this behaviour, you should use
>> processRawData() to process your expression data matrix and pass the
>> resulting ExpressionTimeSeries object to GP...() functions.
>>
>>
>> Antti
>>
>>
>> On 2014-02-11 00:16 , Solanki, Anisha wrote:
>>> Dear Antti,
>>>
>>> Thanks for your reply. The information you have given me has been very
>>> useful. I had another quick question regarding the mmgmos command. I
>>> understand that the command accepts data as an AffyObject. However, I
>>> have
>>> data from RNA-Seq and not from affymetrix microarrays. Hence I cannot
>>> create an Affyobject from my data as the object requires CEL files to
>>> convert the data into an AffyObject. Is there any other alternative
>>> other
>>> than using an Affyobject. I have tried to run a matrix with the
>>> expression
>>> values of every sample from my data. However the command mmgmos doesn't
>>> seem to accept this as a valid object.
>>>
>>> Please advise.
>>>
>>> Thanks
>>>
>>> Anisha
>>>
>>>
>>> On 10/02/2014 09:38, "Antti Honkela" <antti.honkela at hiit.fi> wrote:
>>>
>>>> On 2014-02-09 18:49 , Solanki, Anisha wrote:
>>>>
>>>> Dear Anisha,
>>>>
>>>>> I have now solved the previous error by adding variances independently
>>>>> to
>>>>> the expression Dataset.
>>>>
>>>> The error variances are critical to the accuracy of the method, so you
>>>> should never just impute any values there without careful
>>>> consideration.
>>>> More about how you could fix this better below.
>>>>
>>>>> I just had another quick question. The targets are
>>>>> ranked by the log-likelihood. Does this mean that the higher the
>>>>> log-likelihood the greater the probability of the gene being a target
>>>>> or
>>>>> vice versa? Also what does null log likelihood stand for?
>>>>
>>>> Our method is based on comparing log-likelihoods over different data
>>>> sets (time series for different genes), which is slightly trickier than
>>>> usual comparison of log-likelihoods over the same data.
>>>>
>>>> The log-likelihood measures how well the data fit a model assuming
>>>> regulation, therefore higher log-likelihood should be counted as
>>>> evidence for being a target.
>>>>
>>>> That said, some time series are easy to fit, and get a high likelihood
>>>> over practically any model. To catch these, we fit the baseline or null
>>>> model (which is just a time-independent Gaussian). We can then filter
>>>> out genes that fit the null model equally well or better than the true
>>>> model.
>>>>
>>>> Finally, even though one might consider the likelihood ratio of real
>>>> vs.
>>>> null a useful statistic, it is actually not good for ranking. This is
>>>> because the range of null model likelihoods is much larger, and
>>>> therefore the ranking will be determined by how badly the null model
>>>> fits instead of how well the real model fits, and tell nothing about
>>>> the
>>>> regulation.
>>>>
>>>> In summary, you should:
>>>> 1. *Filter* by likelihood ratio real/null: only keep genes where
>>>>    log-likelihood > null-log-likelihood
>>>> 2. *Rank* remaining genes by log-likelihood
>>>>
>>>>>> I think this means that my Data lacks calculated variances. As I
>>>>>> understand from your User guide you process affymetrix Datasets using
>>>>>> the
>>>>>> mmgmos command from the PUMA package which automatically calculates
>>>>>> the
>>>>>> variances for you. However, when I try to run my expression value
>>>>>> matrix
>>>>>> through this mmgmos command it doesn't work and gives me this error
>>>>>> "unable to find an inherited method for function ŒprobeNames¹ for
>>>>>> signature Œ"ExpressionTimeSeries"¹
>>>>
>>>> You should run mmgmos on the original AffyBatch object, not on an
>>>> ExpressionTimeSeries object.
>>>>
>>>>
>>>> Hope this helps,
>>>>
>>>> Antti
>>>>
>>>> --
>>>> Antti Honkela
>>>> antti.honkela at hiit.fi   -   http://www.hiit.fi/u/ahonkela/
>>>>
>>>
>>
>> --
>> Antti Honkela
>> antti.honkela at hiit.fi   -   http://www.hiit.fi/u/ahonkela/
>>
>

-- 
Antti Honkela
antti.honkela at hiit.fi   -   http://www.hiit.fi/u/ahonkela/