[BioC] Segfault in MEDIPS MEDIPS.methylProfiling() with ROI file

Stephen Turner vustephen at gmail.com
Thu Mar 14 16:29:14 CET 2013


Lukas,

Sorry for the late response to this one. I missed your response on the
list previously. I checked that the IDs are unique, and I'm still
getting the segfault. I've updated the ROI file here:

https://gist.github.com/stephenturner/4997682

Let me ask you a related question about the MEDIPS workflow. Would it
make sense in your opinion to use MEDIPS.methylProfiling() on my MeDIP
samples only to get the rms/ams scores for sliding windows or regions
of interest for each biological replicate, and export those to a table
for downstream analysis of differential methylation in a tool like
DESeq/edgeR/limma that can utilize biological replicates?

Thanks,

Stephen

On Thu, Feb 21, 2013 at 4:29 PM, Lukas Chavez
<lukas.chavez.mailings at googlemail.com> wrote:
>
> Hi Stephen,
>
> thank you for the additional details about the segfault error. You have
> convinced me that your genomic regions appear to be fine. The only thing
> that came to my attention is that your IDs are not unique. The
> MEDIPS.methylProfiling() function is calling the C function roiprofile.c
> which has problems with non-unique IDs for the ROIs. It would be great, if
> you can once more try your analysis with unique IDs (e.g. just adding a
> counter to each ID)?
>
> One more thing: the majority of your regions of interest are very large (up
> to 2,960,899 nt). In my opinion, a single averaged methylation value for
> such a long region is difficult to interpret. MeDIP signals typically vary
> in a much smaller resolution along the chromosomes and a single methylation
> as well as CpG density value for a long stretch will be probably just an
> average of many local enriched regions and many non enriched region.
> However, as I do not know what exactly you are analyzing your approach might
> be reasonable and I hope we get the segfault fixed.
>
> As I mentioned previously, the soon to be released update does not depend on
> C functions anymore but makes extensive use of the GenomicRanges and other
> packages so that the current segfault error will be avoided anyway.
>
> Thank you,
> Lukas
>
>
> On Wed, Feb 20, 2013 at 10:24 AM, Stephen Turner <vustephen at gmail.com>
> wrote:
>>
>> Lukas,
>>
>> I checked and there are no positions where start>end position. I'm
>> getting my regions of interest by using biomaRt to map Illumina WG6v2
>> identifiers to position and gene name, and using this as my ROI file.
>>
>> Here's how I'm getting the position information:
>>
>> #########################
>> library(biomaRt)
>> genes <- read.table("genes.txt", T) # genes$illumina contains the wg6v2 id
>> mart <- useMart("ensembl", dataset="mmusculus_gene_ensembl")
>> listAttributes(mart)
>> attributes <- c("illumina_mousewg_6_v2", "chromosome_name",
>>                 "start_position","end_position", "ensembl_gene_id",
>>                 "external_gene_id", "description")
>> genes_annotated <- getBM(attributes=attributes,
>> filters="illumina_mousewg_6_v2", values=genes$illumina, mart=mart,
>> uniqueRows=T)
>> roi_file <- subset(genes_annotated, chromosome_name!="Y",
>> select=c(chromosome_name, start_position, end_position,
>> external_gene_id))
>> roi_file$chromosome_name <- paste("chr", roi_file$chromosome_name, sep="")
>> subset(roi_file, end_position<start_position) #returns nothing
>> write.table(roi_file, file="ROI_file.txt", row=F, col=F, quote=F,
>> sep="\t")
>> #########################
>>
>> You can get my ROI file here:
>>
>> https://gist.github.com/stephenturner/4997682
>>
>> The error it throws suggests region #379 is the problem. That
>> corresponds to chr1:135720061-135752232 (ENSMUSG00000026421), which
>> isn't outside the length of the chromosome. Not sure what the problem
>> is. Here's the error below:
>>
>> #########################
>> > dmrGenes <- MEDIPS.methylProfiling(data1=CONTROL.SET, data2=BPA.SET,
>> > ROI_file="ROI_file.txt", select=2)
>> Preprocessing...
>> Reading ROIs...
>> Extract data according to given ROI...
>> Differential methylation will be calculated on the ROI data set
>> Analysed 379 / 2893
>>  *** caught segfault ***
>> address 0x2ad79ec9ce98, cause 'memory not mapped'
>>
>> Traceback:
>>  1: .Call("roiprofile", input, as.numeric(select), as.matrix(ROI2),
>>  as.integer(chr_binposition), data1, data2, environment(wilcox.test),
>>    wilcox.test, environment(var), var, environment(math), math,
>> t.test, environment(t.test), as.numeric(factor(chr_names(data1))))
>>  2: withCallingHandlers(expr, warning = function(w)
>> invokeRestart("muffleWarning"))
>>  3: suppressWarnings(.Call("roiprofile", input, as.numeric(select),
>>  as.matrix(ROI2), as.integer(chr_binposition), data1, data2,
>> environment(wilcox.test), wilcox.test, environment(var),     var,
>> environment(math), math, t.test, environment(t.test),
>> as.numeric(factor(chr_names(data1)))))
>>  4: MEDIPS.methylProfiling(data1 = CONTROL.SET, data2 = BPA.SET,
>> ROI_file = "ROI_file.txt", select = 2)
>> aborting ...
>> #########################
>>
>> Thanks for any help you can provide!
>>
>> Stephen
>>
>> On Tue, Feb 19, 2013 at 3:18 PM, Lukas Chavez
>> <lukas.chavez.mailings at googlemail.com> wrote:
>> >
>> > Hi Stephen,
>> >
>> > please excuse my late respond.
>> >
>> > Indeed, the segfault occurs whenever the currently processed genomic
>> > region
>> > is of negative length (i.e. the start coordinate is larger than the end
>> > coordinate), the coordinates are outside of the length of the
>> > chromosome, or
>> > the chromosome of the ROI is not represented by the regions used as
>> > input
>> > for creating the MEDIPS SETs. Typically, the error can be avoided by
>> > correcting the ROI file. Although you have already checked problematic
>> > genomic regions, I would be more than happy, if you once more check
>> > these
>> > regions (and also the immediate neighbors) with respect to the
>> > constraints I
>> > just mentioned.
>> >
>> > Please let me know, if your ROI file appears to be fine. In this case I
>> > will
>> > try to back-track the error message. However, please note that I have
>> > extensively revised the MEDIPS package (also avoiding segfault errors)
>> > which
>> > I intend to update as soon as possible, especially in advance of the
>> > next
>> > Bioconductor release. I strongly recommend to switch to the new version
>> > as
>> > soon as available.
>> >
>> > Thank you and all the best,
>> > Lukas
>> >
>> >
>> >
>> >
>> >
>> > On Fri, Feb 15, 2013 at 9:23 AM, Stephen Turner <vustephen at gmail.com>
>> > wrote:
>> >>
>> >> Lukas, and others:
>> >>
>> >> I'm trying to use the MEDIPS package to look for differentially
>> >> methylated regions, supplying a regions of interest file (essentially
>> >> a bed file). I was able to successfully run MEDIPS.methylProfiling
>> >> supplying the frame_size=500 argument to look for DMRs in 500-bp
>> >> windows. Now I'd like to supply my own regions of interest to look for
>> >> DMR around genes that are differentially expressed from microarray.
>> >>
>> >> I get the following segfault:
>> >>
>> >> ###############
>> >> > dmrGenes <- MEDIPS.methylProfiling(data1=CONTROL.SET, data2=BPA.SET,
>> >> > ROI_file="ROI_file.txt", select=2)
>> >> Preprocessing...
>> >> Reading ROIs...
>> >> Extract data according to given ROI...
>> >> Differential methylation will be calculated on the ROI data set
>> >> Analysed 379 / 2893
>> >>  *** caught segfault ***
>> >> address 0x2b61ee9d5e98, cause 'memory not mapped'
>> >>
>> >> Traceback:
>> >>  1: .Call("roiprofile", input, as.numeric(select), as.matrix(ROI2),
>> >>  as.integer(chr_binposition), data1, data2, environment(wilcox.test),
>> >>    wilcox.test, environment(var), var, environment(math), math,
>> >> t.test, environment(t.test), as.numeric(factor(chr_names(data1))))
>> >>  2: withCallingHandlers(expr, warning = function(w)
>> >> invokeRestart("muffleWarning"))
>> >>  3: suppressWarnings(.Call("roiprofile", input, as.numeric(select),
>> >>  as.matrix(ROI2), as.integer(chr_binposition), data1, data2,
>> >> environment(wilcox.test), wilcox.test, environment(var),     var,
>> >> environment(math), math, t.test, environment(t.test),
>> >> as.numeric(factor(chr_names(data1)))))
>> >>  4: MEDIPS.methylProfiling(data1 = CONTROL.SET, data2 = BPA.SET,
>> >> ROI_file = "ROI_file.txt", select = 2)
>> >> aborting ...
>> >> ###############
>> >>
>> >> I ran this on a machine with 128GB RAM, so I know that wasn't the
>> >> problem. It looks like the segfault was happening with line 379 in the
>> >> sample above. I went into the regions of interest (ROI) file
>> >> containing gene coordinates. Nothing looked weird about this line, but
>> >> I deleted it anyway. When re-running, I still get segfaults, just at
>> >> different positions.
>> >>
>> >> Thanks for any insight you might have.
>> >> Stephen
>> >>
>> >>
>> >> > sessionInfo()
>> >> R version 2.15.2 (2012-10-26)
>> >> Platform: x86_64-unknown-linux-gnu (64-bit)
>> >>
>> >> locale:
>> >> [1] C
>> >>
>> >> attached base packages:
>> >> [1] stats     graphics  grDevices utils     datasets  methods   base
>> >>
>> >> other attached packages:
>> >> [1] BSgenome.Mmusculus.UCSC.mm9_1.3.19 MEDIPS_1.8.0
>> >> [3] BSgenome_1.26.1                    Biostrings_2.26.3
>> >> [5] GenomicRanges_1.10.6               IRanges_1.16.4
>> >> [7] BiocGenerics_0.4.0
>> >>
>> >> loaded via a namespace (and not attached):
>> >> [1] gtools_2.7.0    parallel_2.15.2 stats4_2.15.2
>> >>
>> >> _______________________________________________
>> >> Bioconductor mailing list
>> >> Bioconductor at r-project.org
>> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> >> Search the archives:
>> >> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >
>> >
>
>



More information about the Bioconductor mailing list