[BioC] minfi release 1.8

Fri May 23 21:16:36 CEST 2014

Thanks Tim for sharing this.

I think it is very useful. But I kind of stuck at generating 
genomeRatioSet, it has error message.

*> gRatioSet <- mapToGenome(ratioSet, mergeManifest = TRUE)Error in 
.Call2("solve_user_SEW0", start, end, width, PACKAGE = "IRanges") :   
solving row 1: range cannot be determined from the supplied arguments (too 
many NAs)*I am not sure what's wrong? If you happen to know, could you 
please share it with me?

Thanks a lot for the help.

Emma

On Saturday, February 1, 2014 2:06:59 PM UTC-5, Tim Triche, Jr. wrote:
>
> There are at least 3 different ways you could go about this; Tiffany 
> Morris' probe variant annotation package accompanying ChAMP is probably the 
> most comprehensive, while (I claim) the UCSC common variant track is the 
> most trivial and the SNPlocs packages are the most intensive.  Let's assume 
> your data is in a GenomicRatioSet named grSet for all examples.
>
> Writing and testing out the code for this took a little while so I do hope 
> you'll work through each example.
>
> All things considered, I'd recommend option #1, but it would be even 
> better if the maintainers tweaked a few settings so that 
> bsseq::data.frame2GRanges(probe.450K.VCs.af) was more trivial.  (Also: 
> consider merging $diVC.com1pop.F and $diVC.com1pop.R, making everything 
> that can reasonably be so boolean, etc... GRanges are so much easier to 
> work with than data.frames when dealing with genomic coordinates)
>
> You'll need to supply your own grSet, but if you didn't have one, I don't 
> imagine you'd have asked :-)
> I have provided the dimensions (rows x columns) of the filtered grSets for 
> comparison in each case.   You will have to decide what degree of 
> conservatism is most appropriate for your experiment.  Read the docs!
>
>
> 1) Use the ProbeVariants package alliuded to above.  This is 
> comprehensive, but a bit of a PITA.
>
> require(minfi)
> grSet
> ## class: GenomicRatioSet 
> ## dim: 485512 34 
>
> require(Illumina450ProbeVariants.db)
> ?probe.450K.VCs.af ## read the docs!
> data(probe.450K.VCs.af)
>
> commonSnpInProbe <- rownames(probe.450K.VCs.af)[ probe.450K.VCs.af$probe50VC.com1pop 
> > 0 ] 
> grSet.noCommonSNPsInProbes <- grSet[ 
> setdiff(rownames(grSet), commonSnpInProbe), ] 
> grSet.noCommonSNPsInProbes
> ## class: GenomicRatioSet 
> ## dim: 308901 34 
>
> commonVariantsEitherStrand <- !is.na(probe.450K.VCs.af$diVC.com1pop.F)|!
> is.na(probe.450K.VCs.af$diVC.com1pop.R) 
> CpGsWithCommonVariants <- rownames(probe.450K.VCs.af)[ commonVariantsEitherStrand 
> ]
> grSet.noCommonVariantsAtCpGs <- grSet[ 
> setdiff(rownames(grSet),CpGsWithCommonVariants),  ] 
> grSet.noCommonVariantsAtCpGs
> ## class: GenomicRatioSet 
> ## dim: 445502 34 
>
> Consult the documentation for finer control over the process (e.g. 
> specific populations, etc.).
>
> ?probe.450K.VCs.af ## read the docs again
>
> It would not break my heart if the maintainers turned the data.frame here 
> into a GRanges; a few small changes followed by data.frame2GRanges() would 
> do the trick.  But still, it's a very handy compilation.
> See below for the reason I prefer GRanges (in short, because I'm impatient 
> and don't like debugging). 
>
>
> 2) Use the FDb package (> 1% MAF across all populations, dbSNP build 135). 
>  This is a one-liner.
>
> require(minfi)
> grSet
> ## class: GenomicRatioSet 
> ## dim: 485512 34 
>
> require(FDb.UCSC.snp135common.hg19) 
> commonSNPs <- features(FDb.UCSC.snp135common.hg19)
>
> ## With a GRanges object, the previous rigamarole becomes a one-liner:
> grSet.noCommonSNPsAtCpGs <- grSet[ grSet %outside% commonSNPs, ]
>
> grSet.noCommonSNPsAtCpGs
> ## class: GenomicRatioSet 
> ## dim: 478385 34 
>
> This was my workaround from 2-3 years ago to permit masking of TCGA Level 
> 3 methylation data.   It still works and it still works well, but it's been 
> superseded (IMHO) by more flexible approaches like #1.
>
>
> 3) use SNPlocs (all SNPs in dbSNP; mildly annoying complication with 'ch' 
> vs. 'chr' seqlevels)
>
> require(minfi)
> grSet
> ## class: GenomicRatioSet 
> ## dim: 485512 34 
>
> ## work around the annoyance:
> chroms <- seqlevels(grSet)
> names(chroms) <- chroms
> chroms <- gsub('chr', 'ch', chroms)
> require(SNPlocs.Hsapiens.dbSNP.20120608)
> SNPs.byChr <- GRangesList(lapply(chroms, getSNPlocs, as.GRanges=TRUE))
> ## time passes...
> seqlevels(SNPs.byChr) <- gsub('ch', 'chr', seqlevels(SNPs.byChr)) ## back 
> to normal
> genome(SNPs.byChr) <- 'hg19'  ## GRCh37.p5 coordinates are identical to 
> hg19 save for chrMT 
>
> ## once the above hoops have been jumped through, it's back to one-liners:
> grSet.noSNPsAtCpGs <- grSet[ grSet %outside% SNPs.byChr, ]
> grSet.noSNPsAtCpGs
>  ## class: GenomicRatioSet 
> ## dim: 444722 34 
>
> This used to be documented in the minfi code/manual somewhere, though I 
> don't know if it still is. 
>
>
> Statistics is the grammar of science.
> Karl Pearson <http://en.wikipedia.org/wiki/The_Grammar_of_Science>
>
>
> On Fri, Jan 31, 2014 at 1:06 PM, C T <off... at gmail.com <javascript:>>wrote:
>
>> Any tutorial on how to remove probes that contains SNPs?
>>
>> On Tuesday, November 12, 2013 7:12:46 PM UTC-5, Kasper Hansen wrote:
>> >
>> > As part of Bioconductor 2.13, we have released minfi 1.8.x.  Due to a
>> > number of last minute errors, the recommended version is 1.8.3 (or 
>> bigger).
>> >
>> > Users may find that their old objects cannot be linked to annotation.
>> >  Please run
>> >  OBJECT = updateObject(OBJECT)
>> > to fix this.
>> >
>> > Highlights include
>> > * preprocessingQuantile(): an independent implementation of the same 
>> ideas
>> > as in Tost et al.
>> > * bumphunter() for finding DMRs
>> > * blockFinder() for finding large hypo-methylated blocks on the 450k 
>> array.
>> > * estimateCellCounts() for estimating cell type composition for whole
>> > blood samples. The function can be extended to work on other types of
>> > cells, provided suitable flow sorted data is available.
>> > * the annotation now includes SNP annotation for dbSNP v132, 135 and 
>> 137,
>> > independently annotated at JHU.
>> > * getSex(): you can now get sex repeatedly, irrespective of relationship
>> > status.
>> > * minfiQC: find and remove outlier samples based on a sample QC criteria
>> > we have found effective.
>> >
>> > Unfortunately, none of these handy changes are yet detailed in the
>> > vignette; we are working on this.
>> >
>> > A manuscript is in review detailing most of these functions.
>> >
>> > Full NEWS below
>> >
>> > Best,
>> > Kasper D Hansen
>> >
>> >     o   Added getMethSignal(), a convenience function for programming.
>> >
>> >     o   Changed the argument name of "type" to "what" for 
>> getMethSignal().
>> >
>> >     o   Added the class "RatioSet", like "GenomicRatioSet" but without 
>> the
>> >         genome information.
>> >
>> >     o   Bugfixes to the "GenomicRatioSet()" constructor.
>> >
>> >     o   Added the method ratioConvert(), for converting a "MethylSet" 
>> to a
>> >         "RatioSet" or a "GenomicMethylSet" to a "GenomicRatioSet".
>> >
>> >     o   Fixed an issue with GenomicMethylSet() and GenomicRatioSet() 
>> caused
>> >         by a recent change to a non-exported function in the 
>> GenomicRanges
>> >         package (Reported by Gustavo Fernandez Bayon <gba... at gmail.com
>> <javascript:>
>> > >).
>> >
>> >     o   Added fixMethOutliers for thresholding extreme observations in 
>> the
>> >         [un]methylation channels.
>> >
>> >     o   Added getSex, addSex, plotSex for estimating sex of the samples.
>> >
>> >     o   Added getQC, addQC, plotQC for a very simple quality control
>> >         measure.
>> >
>> >     o   Added minfiQC for a one-stop function for quality control 
>> measures.
>> >
>> >     o   Changed some verbose=TRUE output in various functions.
>> >
>> >     o   Added preprocessQuantile.
>> >
>> >     o   Added bumphunter method for "GenomicRatioSet".
>> >
>> >     o   Handling signed zero in minfi:::.digestMatrix which caused unit
>> >         tests to fail on Windows.
>> >
>> >     o   addSex and addQC lead to sampleNames() being dropped because of 
>> a
>> >         likely bug in cbind(DataFrame, DataFrame).  Work-around has been
>> >         implemented.
>> >
>> >     o   Re-ran the test data generator.
>> >
>> >     o   Fixed some Depends and Imports issues revealed by new features 
>> of R
>> >         CMD check.
>> >
>> >     o   Added blockFinder and cpgCollapse.
>> >
>> >     o   (internal) added convenience functions for argument checking.
>> >
>> >     o   Exposed and re-wrote getAnnotation().
>> >
>> >     o   Changed getLocations() from being a method to a simple function.
>> >         Arguments have been removed (for example, now the function 
>> always
>> >         drops non-mapping loci).
>> >
>> >     o   Implemented getIslandStatus(), getProbeType(), getSnpInfo() and
>> >         addSnpInfo().  The two later functions retrieve pre-computed SNP
>> >         overlaps, and the new annotation object includes SNPs based on
>> >         dbSNP 137, 135 and 132.
>> >
>> >     o   Changed the IlluminaMethylatioAnnotation class to now include
>> >         genomeBuild information as well as defaults.
>> >
>> >     o   Added estimateCellCounts for deconvolution of cell types in 
>> whole
>> >         blood.  Thanks to Andrew Jaffe and Andres Houseman.
>> >
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Biocon... at r-project.org <javascript:>
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>