[BioC] applicability of tilingArray package

Mon Nov 10 22:26:57 CET 2008

wolfgang,

thanks again for the thoughtful response. i think i have a much better 
understanding of what you've said.

more info...
> I think that, for the data you have, there are *two* different 
> segmentation tasks.
> (i) segmentation of what is transcribed (at all) in each of the 
> conditions
> (ii) identification of what is *differentially* transcribed between 
> the conditions,
>
> The method in the tilingArray package was designed for (i). Perhaps it 
> would be helpful if you could clarify which one you are after. 
> Personally, I think that solving (ii) without at least giving some 
> shot at (i) will leave you with biological interpretation problems, 
> and underuses the data. 
i am trying to determine (ii). more precisely - i want to segment what 
is differentially transcribed; and, to say that in a different way: i'd 
like to identify the locations along the plot of differential 
transcription where change points occur. i think this wording works more 
when thinking of a two sample TAS analysis - were the signal result *is* 
a measure of the differential transcription. what's i've said doesn't 
make as much sense in relation to your suggestion of using the 
tilingArray package and getting two sets of transcription levels 
(although, what you've said makes perfect sense):

> [...snip...] And, more precisely, you could get two separate 
> collections of expressed segments (one for wt and one for mutant); 
> then you need to find some sort of consensus segmentation that is a 
> compromise and superset of them both, and then you can ask:
> - which segments have different expression levels between the two 
> conditions
> - which segments change size (transcription start and stop sites) 
> between the conditions
> But for this there is no readymade software that I am aware of. 
what you've described above seems thorough and a very good approach.

re: TAS "doing it all" - when we've mentioned TAS doing the segmentation 
of the differentially expressed regions, i've been thinking of the 
'interval analysis' function in TAS. it just occurred to me that the 
interval analysis is really just a thresholding sort of thing. all it 
does is identity regions above/below some user defined value. this is 
very different from the type of segmentation used in Huber, et al, which 
is more like change point identification. is there some other function 
in TAS that i'm not aware of that performs this more complex 
segmentation? (i suppose this is the wrong place to as that question).

tia,
mike palumbo

Wolfgang Huber wrote:
> Dear Michael,
>
>> thanks for your thoughts. i have to say i'm afraid i only sort of 
>> follow what you've said. in an effort to clarify, it sounds like 
>> you've said the methods in the tilingArray package probably aren't a 
>> good approach to do the segmentation given the data i have.
>
> I think that, for the data you have, there are *two* different 
> segmentation tasks.
>
> (i) segmentation of what is transcribed (at all) in each of the 
> conditions
>
> (ii) identification of what is *differentially* transcribed between 
> the conditions,
>
> The method in the tilingArray package was designed for (i). Perhaps it 
> would be helpful if you could clarify which one you are after. 
> Personally, I think that solving (ii) without at least giving some 
> shot at (i) will leave you with biological interpretation problems, 
> and underuses the data.
>
>> you've said TAS's approach to the segmentation might be good, but 
>> finding the best parameters might be difficult. you've also said that 
>> for (i) i could use the methods of David et al and Huber et al using 
>> MM probes. if i do that, i'll be left with two separate collections 
>> (wt and mut) of normalized data, which i'll then need to find (ii), 
>> ie, the differentially transcribed regions and then segment those 
>> results.
>
> Yes. And, more precisely, you could get two separate collections of 
> expressed segments (on for wt and one for mutant); then you need to 
> find some sort of consensus segmentation that is a compromise and 
> superset of them both, and then you can ask:
> - which segments have different expression levels between the two 
> conditions
> - which segments change size (transcription start and stop sites) 
> between the conditions
> But for this there is no readymade software that I am aware of.
>
>> the confusing part for me is connecting what you've said about TAS to 
>> using the MM normalizing methods. i don't see how i could use the MM 
>> normalizing methods and get 2 data sets of expression levels and then 
>> use TAS to find the differentially transcribed data and segments. 
>
> Me neither.
>
>> maybe you're suggestion one or the other, ie, stick with TAS to do it 
>> all, 
>
> Yes, that is an option.
>
>> or use huber et al, MM for normalizing and then find some other 
>> method to find the differentially transcribed regions and segmentation?
>
> Yes, that is another option. See above. It seems that the second 
> option might turn out to be more flexible to adapt to your biological 
> questions, and possibly more sensitive, but it's also more work for you.
>
> Best wishes
>  Wolfgang
>
> ----------------------------------------------------
> Wolfgang Huber, EMBL-EBI, http://www.ebi.ac.uk/huber
>
>
>
>>> Hi Michael,
>>>
>>> there are two separate issues:
>>> (i) finding the transcribed regions, separately in each of the samples
>>> (wt, mut).
>>> (ii) finding the differentially transcribed regions.
>>>
>>> For (i), you could use an approach similar to that in the David et al.
>>> and Huber et al. papers. Since you don't have the DNA reference hybes,
>>> you could use the MM probes. This is described in Section 4.2 of the
>>> vignette
>>> http://www.bioconductor.org/packages/2.3/bioc/vignettes/tilingArray/inst/doc/assessNorm.pdf 
>>>
>>> and as the benchmarks in Section 5 show, it is not quite as good, but
>>> still pretty good.
>>>
>>> Don't think of this in terms of "normalising" the mutant against the
>>> "wt" type, that doesn't make much sense.
>>>
>>> For (ii), if you want to segment e.g. a probe-wise (moderated)
>>> t-statistic, the piecewise constant model using in the tilingArray
>>> package is not useful. A running window approach (like in TAS) makes
>>> sense, the hard part is of course tuning its parameters.
>>>
>>> AfaIk, there are methods for (i) and (ii) separately, and to join /
>>> align them, the approaches are ad hoc. It would be nice if there were a
>>> clean method that does (i) and (ii) jointly - maybe someone else has
>>> insights in this?
>>>
>>> Best wishes
>>>  Wolfgang
>>>
>>> ------------------------------------------------------------------
>>> Wolfgang Huber  EBI/EMBL  Cambridge UK  http://www.ebi.ac.uk/huber
>>>
>>>
>>> 04/11/2008 16:42 Michael Palumbo scripsit
>>>  
>>>> hello,
>>>>
>>>> i have general questions regarding the applicability of the 
>>>> tilingArray
>>>> package to my problem/data. i've used bioconductor in the past, but by
>>>> no means am i an expert.
>>>>
>>>> i have data from affy yeast tiling arrays - 3 mut and 3 wild type. 
>>>> i've
>>>> run affy's TAS program on the CEL files - as a two sample analysis, 
>>>> ie,
>>>> comparing wt to mut and viewed the results in IGB. my initial goal 
>>>> is to
>>>> segment the results as was done in David et al, PNAS 2006. it seems to
>>>> me there are fundamental differences in my data and the data of 
>>>> David et
>>>> al. e.g., the normalization step described in tilingArray doc uses DNA
>>>> hybridized to the chips as a reference - i don't have that, although i
>>>> do have the wt data. a colleague thought i might be able to use the wt
>>>> data in the normalization step, but that doesn't seem quite right 
>>>> to me.
>>>> it is also described that normalization can occur by MM probes - 
>>>> maybe i
>>>> can normalize the mut chip data w/ MM probes and completely ignore the
>>>> wt data? i realize that if i did that, the result would no longer be a
>>>> comparison of mut and wt and what i would 'see' would be different 
>>>> from
>>>> what i currently see in IGB of the two sample TAS analysis. this also
>>>> seems like it's not the best approach.
>>>>
>>>> on the other hand, again, all i really want to do is segment the
>>>> two-sample analysis that i've done. is there anything wrong with using
>>>> the results of TAS's analysis? TAS does a normalization and has
>>>> bandwidth averaging - as a non-expert, these are convenient and seem
>>>> good to me.
>>>>
>>>> thanks in advance for any and all responses/thoughts,
>>>> mike palumbo
>>>>
>>>>     
>>

-- 
Michael Palumbo					palumbo at wadsworth.org
Bioinformatics Core				voice  (518) 402-4587
Wadsworth Center				fax    (518) 402-4623
Center for Medical Science
New York State Dept of Health
150 New Scotland Ave
Albany, NY 12208

IMPORTANT NOTICE: This e-mail and any attachments may contain
confidential or sensitive information which is, or may be, legally
privileged or otherwise protected by law from further disclosure.  It
is intended only for the addressee.  If you received this in error or
from someone who was not authorized to send it to you, please do not
distribute, copy or use it or any attachments.  Please notify the
sender immediately by reply e-mail and delete this from your
system. Thank you for your cooperation.