[BioC] ChiPseq input files?

Jonathan Cairns Jonathan.Cairns at cancer.org.uk
Mon Sep 17 18:57:44 CEST 2012


see: http://genome.ucsc.edu/FAQ/FAQformat.html#format1 - each region should represent a single mapped read.

Format 2 is insufficient to determine the original read locations. In fact, so is format 1 as presented; I assume the 5 on the end of 94545 is a typo. Format 1 is also missing "strand", so if you have the original .bam files, I'd suggest starting from those and sticking to the bed format outlined above.

How to perform bam -> bed file conversion in python is not a bioconductor-related question and is therefore outside of the scope of this mailing list.

J
________________________________________
From: John linux-user [johnlinuxuser at yahoo.com]
Sent: 17 September 2012 17:37
To: Jonathan Cairns; bioconductor at r-project.org
Subject: Re: [BioC] ChiPseq input files?

Hi Jonathan,

Your clarification is great and how to create the bed file and what format the bed file would be is the exact question I like to ask, e.g counting reads in each base position or in each regions. If in each regions, how to decide the length of each region? Two specific example below for two formats. It would be easy to count reads in format1, but if format2, it would be hard to determine the range. Thanks for further suggestions. Best, John

format 1,
chr   start     end   reads
chr1,6557,6557,  233
ch10,9454,94545,100

format 2,
chr    start   end   reads
chr1, 6557,8567,  2333
ch10,9454,194595,1000

________________________________
From: Jonathan Cairns <Jonathan.Cairns at cancer.org.uk>
To: John linux-user <johnlinuxuser at yahoo.com>; "bioconductor at r-project.org" <bioconductor at r-project.org>
Sent: Monday, September 17, 2012 12:14 PM
Subject: RE: [BioC] ChiPseq input files?

Hi,

In RNA-seq, one knows where the regions of interest (i.e. exons) are, so binning is straightforward. No such database of "regions of interest" exists for ChIP-seq. Hence, peak-caller algorithms, to find them.

IRanges/RangedData/GRanges etc are internal R objects, so you'll have a hard time constructing such a thing in python. If disk space is a major issue, you could try creating a .bed file from your .bam file, and then read that in with e.g. read.bed() in BayesPeak, or import() in rtracklayer.

J
________________________________________
From: John linux-user [johnlinuxuser at yahoo.com<mailto:johnlinuxuser at yahoo.com>]
Sent: 17 September 2012 16:55
To: Jonathan Cairns; bioconductor at r-project.org<mailto:bioconductor at r-project.org>
Subject: Re: [BioC] ChiPseq input files?

Hi Jonathan,

Thanks for your response. I just liked to use python instead of R to generate these IRange data.
I looked over the introduction part of RNA-seq data and it seemed that it just counted the read hits overlapped the annotated gene regions as coded below,
and I am wondering if it was the similar things occurred for chip-seq data. Thanks. John


gnModel <- exonsBy(txdb, "gene")
counter <- function(fl, gnModel)
{
    aln <- readGappedAlignments(fl)
    strand(aln) <- "*" # for strand-blind sample prep protocol
    hits <- countOverlaps(aln, gnModel)
    counts <- countOverlaps(gnModel, aln[hits==1])
    names(counts) <- names(gnModel)
    counts
}

________________________________
From: Jonathan Cairns <Jonathan.Cairns at cancer.org.uk<mailto:Jonathan.Cairns at cancer.org.uk>>
To: John linux-user <johnlinuxuser at yahoo.com<mailto:johnlinuxuser at yahoo.com>>; "bioconductor at r-project.org<mailto:bioconductor at r-project.org>" <bioconductor at r-project.org<mailto:bioconductor at r-project.org>>
Sent: Monday, September 17, 2012 11:35 AM
Subject: RE: [BioC] ChiPseq input files?

Hi John,

I'm afraid I don't understand your question. It sounds like you are trying to bin the reads? This shouldn't be necessary, as both packages do this for you. Was that your intended query?

Jonathan
________________________________________
From: John linux-user [johnlinuxuser at yahoo.com<mailto:johnlinuxuser at yahoo.com><mailto:johnlinuxuser at yahoo.com<mailto:johnlinuxuser at yahoo.com>>]
Sent: 17 September 2012 15:59
To: Jonathan Cairns; bioconductor at r-project.org<mailto:bioconductor at r-project.org><mailto:bioconductor at r-project.org<mailto:bioconductor at r-project.org>>
Subject: Re: [BioC] ChiPseq input files?

Hi Jonathan,

Thanks for your response and codes. That saves me a lot of time to look over the webs. Your answers are great!  but if I try to create a table using python or other scripts and then input the table to R for statistics, how can I decide the range (e.g. start and end) when I count the reads in each position across the chromosomes/genome? Can you give me more suggestions? Thanks.

Best,

John

________________________________
From: Jonathan Cairns <Jonathan.Cairns at cancer.org.uk<mailto:Jonathan.Cairns at cancer.org.uk><mailto:Jonathan.Cairns at cancer.org.uk<mailto:Jonathan.Cairns at cancer.org.uk>>>
To: John linux-user <johnlinuxuser at yahoo.com<mailto:johnlinuxuser at yahoo.com><mailto:johnlinuxuser at yahoo.com<mailto:johnlinuxuser at yahoo.com>>>; "bioconductor at r-project.org<mailto:bioconductor at r-project.org><mailto:bioconductor at r-project.org<mailto:bioconductor at r-project.org>>" <bioconductor at r-project.org<mailto:bioconductor at r-project.org><mailto:bioconductor at r-project.org<mailto:bioconductor at r-project.org>>>
Sent: Monday, September 17, 2012 10:38 AM
Subject: RE: [BioC] ChiPseq input files?

Hi John,

I would try the Rsamtools package. You'd need something like this (warning, untested code):

library(Rsamtools)

bamFile = "path/to/Bamfile.bam"
p <- ScanBamParam(what=c("rname", "strand", "pos", "qwidth"))
bam <- scanBam(bamFile, param=p)[[1]]

BayesPeak accepts data.frames or RangedDatas. I would suggest the easiest thing to do is construct a RangedData:

library(IRanges)
IR <- IRanges(start=bam[["pos"]], width=bam[["qwidth"]])
x <- RangedData(ranges=IR, strand=bam[["strand"]], space=bam[["rname"]])

chipseq accepts GRanges by preference:

library(GenomicRanges)
y <- GRanges(seqnames=bam[["rname"]], ranges=IR, strand=bam[["strand"]])

There may be a faster/cleverer way of doing it, but this should work.

Jonathan


________________________________________
From: bioconductor-bounces at r-project.org<mailto:bioconductor-bounces at r-project.org><mailto:bioconductor-bounces at r-project.org<mailto:bioconductor-bounces at r-project.org>><mailto:bioconductor-bounces at r-project.org<mailto:bioconductor-bounces at r-project.org><mailto:bioconductor-bounces at r-project.org<mailto:bioconductor-bounces at r-project.org>>> [bioconductor-bounces at r-project.org<mailto:bioconductor-bounces at r-project.org><mailto:bioconductor-bounces at r-project.org<mailto:bioconductor-bounces at r-project.org>><mailto:bioconductor-bounces at r-project.org<mailto:bioconductor-bounces at r-project.org><mailto:bioconductor-bounces at r-project.org<mailto:bioconductor-bounces at r-project.org>>>] On Behalf Of John linux-user [johnlinuxuser at yahoo.com<mailto:johnlinuxuser at yahoo.com><mailto:johnlinuxuser at yahoo.com<mailto:johnlinuxuser at yahoo.com>><mailto:johnlinuxuser at yahoo.com<mailto:johnlinuxuser at yahoo.com><mailto:johnlinuxuser at yahoo.com<mailto:johnlinuxuser at yahoo.com>>>]
Sent: 17 September 2012 15:04
To: bioconductor at r-project.org<mailto:bioconductor at r-project.org><mailto:bioconductor at r-project.org<mailto:bioconductor at r-project.org>><mailto:bioconductor at r-project.org<mailto:bioconductor at r-project.org><mailto:bioconductor at r-project.org<mailto:bioconductor at r-project.org>>>
Subject: [BioC] ChiPseq input files?

Hi,

I am wondering how to simply prepare the input files for R BayesPeak and chipseq packages, assuming BAM files already generated by BWA and samtools.  Thanks.

John
        [[alternative HTML version deleted]]


NOTICE AND DISCLAIMER
This e-mail (including any attachments) is intended for the above-named person(s). If you are not the intended recipient, notify the sender immediately, delete this email from your system and do not disclose or use for any purpose.

We may monitor all incoming and outgoing emails in line with current legislation. We have taken steps to ensure that this email and attachments are free from any virus, but it remains your responsibility to ensure that viruses do not adversely affect you.
Cancer Research UK
Registered charity in England and Wales (1089464), Scotland (SC041666) and the Isle of Man (1103)
A company limited by guarantee.  Registered company in England and Wales (4325234) and the Isle of Man (5713F).
Registered Office Address: Angel Building, 407 St John Street, London EC1V 4AD.



NOTICE AND DISCLAIMER
This e-mail (including any attachments) is intended for the above-named person(s). If you are not the intended recipient, notify the sender immediately, delete this email from your system and do not disclose or use for any purpose.

We may monitor all incoming and outgoing emails in line with current legislation. We have taken steps to ensure that this email and attachments are free from any virus, but it remains your responsibility to ensure that viruses do not adversely affect you.
Cancer Research UK
Registered charity in England and Wales (1089464), Scotland (SC041666) and the Isle of Man (1103)
A company limited by guarantee.  Registered company in England and Wales (4325234) and the Isle of Man (5713F).
Registered Office Address: Angel Building, 407 St John Street, London EC1V 4AD.



NOTICE AND DISCLAIMER
This e-mail (including any attachments) is intended for the above-named person(s). If you are not the intended recipient, notify the sender immediately, delete this email from your system and do not disclose or use for any purpose.

We may monitor all incoming and outgoing emails in line with current legislation. We have taken steps to ensure that this email and attachments are free from any virus, but it remains your responsibility to ensure that viruses do not adversely affect you.
Cancer Research UK
Registered charity in England and Wales (1089464), Scotland (SC041666) and the Isle of Man (1103)
A company limited by guarantee.  Registered company in England and Wales (4325234) and the Isle of Man (5713F).
Registered Office Address: Angel Building, 407 St John Street, London EC1V 4AD.



NOTICE AND DISCLAIMER
This e-mail (including any attachments) is intended for the above-named person(s). If you are not the intended recipient, notify the sender immediately, delete this email from your system and do not disclose or use for any purpose. 

We may monitor all incoming and outgoing emails in line with current legislation. We have taken steps to ensure that this email and attachments are free from any virus, but it remains your responsibility to ensure that viruses do not adversely affect you. 
Cancer Research UK
Registered charity in England and Wales (1089464), Scotland (SC041666) and the Isle of Man (1103)
A company limited by guarantee.  Registered company in England and Wales (4325234) and the Isle of Man (5713F).
Registered Office Address: Angel Building, 407 St John Street, London EC1V 4AD.



More information about the Bioconductor mailing list