[BioC] ChIPpeakAnno Strandedness and distance calculation

Zhu, Julie Julie.Zhu at umassmed.edu
Fri May 14 04:40:46 CEST 2010


Hi Dario,

You can create the annotation with strand = c(³+²). For example,

AnnotationRangedData = RangedData(IRanges(start = c(967659, 2010898,
2496700, 3075866,
+ 3123260), end = c(967869, 2011108, 2496920, 3076166, 3123470), names =
c("t1",
+ "t2", "t3", "t4", "t5")), space = c("1", "2", "3", "1", "2"), strand
=c("+"))

Please take a look at the examples given on the paper just published on BMC
Bioinformatics
http://www.biomedcentral.com/1471-2105/11/237. In case you could not open
the link, I also attached the pdf file.

Regarding your other question about distance calculation, I suggest to
create your AnnotationRangedData and PeakRangedData with start=midpoint to
get the distance between midpoints.  The distance is calculated differently
for features in plus strand and minus strand. For example, to calculate the
distance between peak and TSS, the distance is calculated as the distance
between the start of the binding site and the TSS, which is the gene start
for genes located on the forward strand and the gene end for genes located
on the reverse strand. Therefore, adding another parameter would mean to
overwrite the way how the distance is calculated based on strandedness.
After you tried the above suggested way and still prefer having a new
parameter, I will be happy to add it to the next release.

Best regards,

Julie


*******************************************
Lihua Julie Zhu, Ph.D
Research Associate Professor
Program in Gene Function and Expression
Program in Molecular Medicine
University of Massachusetts Medical School
364 Plantation Street, Room 613
Worcester, MA 01605
508-856-5256
http://www.umassmed.edu/pgfe/faculty/zhu.cfm





On 5/13/10 9:00 PM, "Dario Strbenac" <D.Strbenac at garvan.org.au> wrote:

> Hello again,
> 
> Just one more question. When we are looking at DNA methtylation, we don't have
> the strand of the peak (because the reverse complement of CG is CG). It seems
> that it might not be possible to do this with ChipPeakAnno ?
> 
> e.g.
> 
>> > head(peaksT)
>     chr     start       end
> 1 chr13  83351701  83352000
> 2 chr13  83351401  83351700
> 3 chr20  25011901  25012200
> 4 chr13  83352001  83352300
> 5  chr8 143402101 143402400
> 6  chr2 238246801 238247100
> 
>> > head(featTable)
>      name  chr strand  start    end
> 1 7896759 chr1      + 781253 783614
> 2 7896761 chr1      + 850983 869824
> 3 7896779 chr1      + 885829 890958
> 4 7896798 chr1      + 891739 900345
> 5 7896817 chr1      + 938709 939782
> 6 7896822 chr1      + 945365 981355
> 
> Also, sometimes our feature table is a table of CpG islands, which don't have
> a strand associated with them.
> 
> e.g.
> 
>> > head(featTable2)
>    chr  start    end CpG Island Name
> 1 chr1  18598  19673        CpG:_116
> 2 chr1 124987 125426         CpG:_30
> 3 chr1 317653 318092         CpG:_29
> 4 chr1 427014 428027         CpG:_84
> 5 chr1 439136 440407         CpG:_99
> 6 chr1 523082 523977         CpG:_94
> 
> Is it possible to do this annotation with ChipPeakAnno ? Currently, the
> annotatePeakInBatch function gives me an error when I don't give it strand
> information when I create my RangedData object.
> 
> Thanks,
>        Dario.
> 
> --------------------------------------
> Dario Strbenac
> Research Assistant
> Cancer Epigenetics
> Garvan Institute of Medical Research
> Darlinghurst NSW 2010
> Australia
> 
> 

On 5/13/10 8:10 PM, "Dario Strbenac" <D.Strbenac at garvan.org.au> wrote:

> Hello,
> 
> Firstly, thank you for making this package. It seems so useful ! We were
> thinking of writing something like this ourselves, until I saw your package,
> because we do a lot of ChIP-Seq here.
> 
> I just have a small feature request. In your distance calculation, you do
> start of peak - start of feature. Would it be possible to allow the user to
> choose if they want the distance calculation to use the start or the middle of
> the feature (and also for the peak) ? This is because we do a lot of
> methylation studies, and for CpG island features, we like to use the midpoint
> as the position of our feature. It would also be nice to be able to use the
> midpoint of the peak as the peak's position, since this is usually where the
> signal is strongest.
> 
> Thanks,
>       Dario.
> 
> --------------------------------------
> Dario Strbenac
> Research Assistant
> Cancer Epigenetics
> Garvan Institute of Medical Research
> Darlinghurst NSW 2010
> Australia



More information about the Bioconductor mailing list