[BioC] Excluding probes during methylation analysis

Sat Jul 6 11:22:10 CEST 2013

Hi Tim,

Thank you for the very helpful pointers. 

I have some other questions remaining ... 

I'm working with methyLumiM or MethyLumiSet objects, is there a way to convert them to a SummarizedExperiment object? 

Should I only be looking at probes in CpGs? 

How do I go about finding probes that map to repetitive regions, or map to multiple regions in the genome? Do I use the resulting SummarisedExperiment object, or the get450k() in FDb.InfiniumMethylation.hg19, or ..? 

Sorry for all the questions, hopefully this would be useful to other people in the future. I found your previous posts on using the GPL13534 helpful, though it seems like things changed since then. 

Thanks again,
Victoria

--
Victoria Svinti
Colon Cancer Genetics Group
MRC Human Genetics Unit, IGMM
University of Edinburgh, Western General Hospital,
Crewe Road, Edinburgh, EH4 2XU

On 5 Jul 2013, at 23:57, "Tim Triche, Jr." <tim.triche at gmail.com> wrote:

> http://www.bioconductor.org/packages/2.12/data/annotation/html/FDb.UCSC.snp137common.hg19.html ,
> http://www.bioconductor.org/packages/2.13/data/annotation/html/FDb.UCSC.snp135common.hg19.html ,
> or http://www.bioconductor.org/packages/release/data/annotation/html/SNPlocs.Hsapiens.dbSNP.20120608.html may be handy.   The first two can simply be overlapped, but are 'common' (MAF > 0.01) SNPs only.  If you want all of the SNPs that have been submitted to dbSNP, you need the SNPlocs package.
> 
> The UCSC snp13[5|7]common packages are compiled from newer builds of dbSNP than the manifest, which had some bizarre inclusions (SNPs which are > 1bp 3' to the targeted locus, for example) when we looked.  I personally screen out common SNPs that overlap the targeted or extension base, using the most recent build available to me, but that's just my preference.  There are arguments to be made for SNPs anywhere from 1 to 49 bases 5' to the target based on melting temperature of the oligos, and there are arguments to be made for genotyping all of your subjects and screening individually for SNPs. 
> 
> Anyways, it is straightforward to dump out the probes that get hit by common SNPs:
> 
> library(FDb.UCSC.snp137common.hg19)
> commonSNPs <- features(FDb.UCSC.snp137common.hg19)
> 
> ## load the data: a SummarizedExperiment is like an eSet, but with a GRanges describing the features
> my.SE <- readRDS('my.SummarizedExperiment.rds')
> dim(my.SE)
> ## [1] 485577     11
> 
> ## mask common SNPs that overlap the targeted CpG (or CpH, or SNP) site
> my.SE.noCpgSNPs <- my.SE[ countOverlaps(my.SE, commonSNPs) < 1, ] 
> dim(my.SE.noCpgSNPs)
> ## [1] 468211     11
> 
> ## retain only CpG probes, and only those that do not overlap a common SNP
> my.SE.noCpgSnps.onlyCpGs <- my.SE.noCpgSNPs[which(substr(rownames(my.SE.noCpgSNPs),1,2)== 'cg') , ] 
> dim(my.SE.noCpgSnps.onlyCpGs)
> ## [1] 465130     11
> 
> I prefer to work on SummarizedExperiments (hence the .SE), as it makes life a bit easier; it also happens to be the parent class for GenomicMethylSet, GenomicRatioSet, etc. in minfi, so the steps are the same for those.  Working on genomic coordinates is (almost?) universally preferable in this respect.
> 
> YMMV...
> 
> 
> 
> 
> On Fri, Jul 5, 2013 at 9:31 AM, Victoria Svinti <victoria.svinti at igmm.ed.ac.uk> wrote:
> Hi there,
> 
> I decided to post after searching the forums for a few days, in hope that somebody can point me in the right direction.
> 
> I am analysing a 450k methylation array to look for differentially methylated sites, and got as far as having normalised data. Various resources suggest that I need to drop probes with know SNPs residing in the sequence, microsattelites, those that anneal to multiple genomic locations etc.
> 
> I have looked into the FDb.InfiniumMethylation.hg19 package (get450k), but I don't see the annotation regarding SNPs (could be due to my unfamiliarity with GRanges). I finally have acquired a list of these from the GEO, Illumina GPL13534, but wonder if it's outdated and if there is a better way of doing this.
> 
> Does someone know of a good/any tutorial for this workflow?
> 
> Many thanks,
> Victoria
> 
> --
> Victoria Svinti
> Colon Cancer Genetics Group
> MRC Human Genetics Unit, IGMM
> University of Edinburgh, Western General Hospital,
> Crewe Road, Edinburgh, EH4 2XU
> 
> 
> 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> 
> 
> 
> -- 
> A model is a lie that helps you see the truth.
> 
> Howard Skipper

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20130706/378fbdea/attachment.pl>