[BioC] Distance from TSS and CPG

Tim Triche, Jr. tim.triche at gmail.com
Wed Dec 14 20:51:50 CET 2011


I wrote this up as an example in the IlluminaHumanMethylation450kprobe
package... which seemingly disappeared into thin air after uploading it!
Oh well. IlluminaHumanMethylation450kprobe for this and several other
common use cases, otherwise here's the man page and data.frame... hopefully
it makes sense.  (There is a similar object in the .db package but without
any sequences)

For what you want, you could just do (even with the crufty old 1.4.6 .db
package)

> library(IlluminaHumanMethylation450k.db)
> sites <- toTable(IlluminaHumanMethylation450kCPG37) # or CPG36 if using
hg18
> chrs <- toTable(IlluminaHumanMethylation450kCHR37) # or CHR36 if using
hg18
> coords <- merge(sites, chrs, by='Probe_ID')
> names(coords) <- c('probe','site','chr')
> head(coords)
       probe      site chr
1 cg00000029  53468112  16
2 cg00000108  37459206   3
3 cg00000109 171916037   3
4 cg00000165  91194674   1
5 cg00000236  42263294   8
6 cg00000289  69341139  14
> library(GenomicFeatures)
> CpGs.unstranded <- with(coords,
                          GRanges(paste('chr',chr,sep=''),
                                  IRanges(start=site, width=1,
names=probe)))
> refgene.TxDb = makeTranscriptDbFromUCSC('refGene', genome='hg19')
> TSS.forward = transcripts(refgene.TxDb,
                            vals=list(tx_strand='+'),
                            columns='gene_id')
> nearest.fwd = precede(CpGs.unstranded, TSS.forward)
> nearest.fwd.eg = nearest.fwd # to keep dimensions right
> notfound = which(is.na(nearest.fwd)) # track for later
> nearest.fwd.eg[-notfound] =

as.character(elementMetadata(TSS.forward)$gene_id[nearest.fwd[-notfound]])
> TSSs.fwd = start(TSS.forward[nearest.fwd[-notfound]])
> distToTSS.fwd = nearest.fwd # to keep dimensions right
> distToTSS.fwd[-notfound] = start(CpGs.unstranded)[-notfound] - TSSs.fwd

And likewise with vals=list(tx_strand='-') for the reverse strand.

For CpG island distance you will need to decide which CpG island definition
to use.  Personally I like Irizarry's.  Once you have constructed a GRanges
object with the start and end coordinates of the CpG islands, most of it
will be equally straightforward.




On Wed, Dec 7, 2011 at 2:25 AM, Khadeeja Ismail <hajjja at yahoo.com> wrote:

> Hi,
>
> I have a list of probes from IlluminaHumanMethylation450k array, and I
> need to
> find the distance from TSS and also the distance from CpG island for each.
> Is
> there a simple way to do this?
>
> Thanks in advance,
> Khadeeja
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



-- 
If people do not believe that mathematics is simple,
it is only because they do not realize how complicated life is. John von
Neumann<http://www-groups.dcs.st-and.ac.uk/~history/Biographies/Von_Neumann.html>


More information about the Bioconductor mailing list