[BioC] MEDIPS error extending reads

Stephen Turner vustephen at gmail.com
Thu Apr 11 17:08:05 CEST 2013


Lukas,

Thanks for the quick response. I went back and mapped my IDs using
mm9, and MEDIPS.selectROIs() worked without a problem.

However, MEDIPS.createROIset() is what I was looking for - to conduct
differential methylation on gene regions of interest. I'm able to
create my MEDIPSroiSets, but when I run:

dmr_rois <- MEDIPS.meth(MSet1=ctlset.roi, MSet2=bpaset.roi,
CSet=CSet.roi, p.adj="fdr", diff.method="edgeR", minRowSum=10)

I get the following error:

Preprocessing MEDIPS SET 1 in MSet1...
Calculating calibration curve...
Error in if (is.null(max_signal_index) & mean_signal[i] < mean_signal[i -  :
  missing value where TRUE/FALSE needed

Not sure what to do here. Looking at my CSet.roi created from the
control MEDIPSroiSet above looks like this:

> CSet.roi
S4 Object of class COUPLINGset
=======================================
Sequence pattern:  CG
Genome:  BSgenome.Mmusculus.UCSC.mm9
Chromosomes: chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11
chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chrM chrX chrY
Chromosome lengths: 197195432 181748087 159599783 155630120 152537259
149517037 152524553 131738871 124076172 129993255 121843856 121257530
120284312 125194864 103494974 98319150 95272651 90772031 61342430
16299 166650296 15902555
Number of sequence pattern in genome:  21342779
Window size:  -1

My control MEDIPSroiSet (ctlset.roi) looks like this. I'm wondering if
it's the NA's in the seqlengths that are causing the problem:

> ctlset.roi
[[1]]
S4 Object of class MEDIPSset
=======================================
Regions file:  nounmapped.m54_ctl_fem_medip_r1.bam
File path:  /.automount/nas8-s.itc.virginia.edu/export/vol46/uvabx/projects/connelly/bpa/newmedips
Genome:  BSgenome.Mmusculus.UCSC.mm9
Number of regions:  18106486
Chromosomes: chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11
chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chrM chrX chrY
Chromosome lengths: 197195432 181748087 159599783 155630120 152537259
149517037 152524553 131738871 124076172 129993255 121843856 121257530
120284312 125194864 103494974 98319150 95272651 90772031 61342430
16299 166650296 15902555
Number of bins per ROI:  1
Reads extended to:  0
Reads shifted by:  0
Parameter uniq:  TRUE
ROIs:
GRanges with 2508 ranges and 0 metadata columns:
                seqnames                 ranges strand
                   <Rle>              <IRanges>  <Rle>
         Mrpl15     chr1   [ 4763287,  4775820]      *
          Sntg1     chr1   [ 8351556,  9289959]      *
  A830018L16Rik     chr1   [11404186, 11965982]      *
          Sulf1     chr1   [12708626, 12850452]      *
         Prdm14     chr1   [13103538, 13117244]      *
            ...      ...                    ...    ...
        Il13ra2     chrX [143818019, 143863735]      *
        Sh3kbp1     chrX [156065204, 156416001]      *
          Txlng     chrX [159216851, 159267386]      *
            Pir     chrX [160707303, 160810943]      *
         Frmpd4     chrX [163909241, 165015163]      *
  ---
  seqlengths:
    chr1 chr10 chr11 chr12 chr13 chr14 ...  chr5  chr6  chr7  chr8  chr9  chrX
      NA    NA    NA    NA    NA    NA ...    NA    NA    NA    NA    NA    NA

[[2]]
S4 Object of class MEDIPSset
=======================================
Regions file:  nounmapped.m58_ctl_mal_medip_r1.bam
File path:  /.automount/nas8-s.itc.virginia.edu/export/vol46/uvabx/projects/connelly/bpa/newmedips
Genome:  BSgenome.Mmusculus.UCSC.mm9
Number of regions:  16441823
Chromosomes: chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11
chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chrM chrX chrY
Chromosome lengths: 197195432 181748087 159599783 155630120 152537259
149517037 152524553 131738871 124076172 129993255 121843856 121257530
120284312 125194864 103494974 98319150 95272651 90772031 61342430
16299 166650296 15902555
Number of bins per ROI:  1
Reads extended to:  0
Reads shifted by:  0
Parameter uniq:  TRUE
ROIs:
GRanges with 2508 ranges and 0 metadata columns:
                seqnames                 ranges strand
                   <Rle>              <IRanges>  <Rle>
         Mrpl15     chr1   [ 4763287,  4775820]      *
          Sntg1     chr1   [ 8351556,  9289959]      *
  A830018L16Rik     chr1   [11404186, 11965982]      *
          Sulf1     chr1   [12708626, 12850452]      *
         Prdm14     chr1   [13103538, 13117244]      *
            ...      ...                    ...    ...
        Il13ra2     chrX [143818019, 143863735]      *
        Sh3kbp1     chrX [156065204, 156416001]      *
          Txlng     chrX [159216851, 159267386]      *
            Pir     chrX [160707303, 160810943]      *
         Frmpd4     chrX [163909241, 165015163]      *
  ---
  seqlengths:
    chr1 chr10 chr11 chr12 chr13 chr14 ...  chr5  chr6  chr7  chr8  chr9  chrX
      NA    NA    NA    NA    NA    NA ...    NA    NA    NA    NA    NA    NA

Thanks for your continuing help!

Stephen

> sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] BSgenome.Mmusculus.UCSC.mm9_1.3.19 MEDIPS_1.10.0
[3] DNAcopy_1.34.0                     BSgenome_1.28.0
[5] Biostrings_2.28.0                  GenomicRanges_1.12.1
[7] IRanges_1.18.0                     BiocGenerics_0.6.0
[9] BiocInstaller_1.10.0

loaded via a namespace (and not attached):
[1] RCurl_1.95-4.1   Rsamtools_1.12.0 XML_3.96-1.1     biomaRt_2.16.0
[5] bitops_1.0-5     gtools_2.7.1     stats4_3.0.0     tools_3.0.0
[9] zlibbioc_1.6.0

On Wed, Apr 10, 2013 at 11:03 PM, Lukas Chavez
<lukas.chavez.mailings at googlemail.com> wrote:
>
> Hi Stephen,
>
> we have slightly modified the MEDIPS.selectROIs() function and I forgot to
> update the manual accordingly, please excuse any inconveniences.
>
> It is not possible to simply specify columns=c("counts") any more but
> concrete column names must be given. In case you still want to summarize all
> 'count' columns of a result table 'resultTable' you have to specify
> something like:
>
> columns = colnames(resultTable)[grep("counts", colnames(resultTable))]
>
> Another example is shown via ?MEDIPS.selectROIs:
> columns=names(resultTable)[grep("counts|rpkm|logFC",names(resultTable))]
>
> Please let me know, if this works for you.
>
> Two more things:
> 1) In addition to the MEDIPS.createSet() function, there is a
> MEDIPS.createROIset() function not described in the manual (see also
> ?MEDIPS.createROIset). In case you want to calculate coverage, rms, and/or
> differential coverage for selected regions only, you might want to try this
> function. Here, the coverage is only calculated at the given regions instead
> of genome wide windows. Please note, normalization, CpG calibration etc. are
> calculated based on the given regions only and I have not yet investigated a
> minimal number of genomic regions so that the applied models remain
> reasonable.
>
> 2) Your annotation file contains genomic regions for chrs 1 to 19 plus the X
> chromosome. Therefore, I assume that this refers to the mouse genome. I have
> processed some sequencing data (bam files) from mouse mapped against mm9 and
> specified BSgenome.Mmusculus.UCSC.mm9 in MEDIPS. When specifying the counts
> parameter of the MEDIPS.selectROIs() as described above (summarize = T), I
> receive only 2602 entries although there are 2611 genomic regions in your
> annotation file (as reported in your error message). See below the
> annotation IDs that are not returned by the MEDIPS.selectROIs() function. I
> have checked the first three of them and the genomic coordinates are outside
> of the length of the chromosomes. Your annotations might fit to mm10. I
> recommend to check, if your annotations and your bam files refer to the same
> genome version.
>
> All the best,
> Lukas
>
> ###
> chr10 130111841 130112788 Olfr823 --> chr10 length: 129993255 (mm9)
> chr15 103503298 103530052 Pde1b --> chr15 length: 103494974 (mm9)
> chr2 181763332 181827797 Myt1 --> chr2 length: 181748087 (mm9)
> Polr3k
> Vwa1
> Agrn
> Plekhn1
> Frmpd4
> CR536618.1
>
>
> On Tue, Apr 9, 2013 at 7:33 AM, Stephen Turner <vustephen at gmail.com> wrote:
>>
>> Thanks Lukas. That solved that problem. I now have another problem.
>> I'm trying to look at regions of interest as before. After I've called
>> MEDIPS.meth(), I then try to run:
>>
>> dmr_rois <- MEDIPS.selectROIs(results=dmr_all, rois=roi_file,
>> columns=c("counts"), summarize=TRUE)
>>
>> where roi_file is created using read.csv() on this file of ROIs:
>> https://gist.github.com/stephenturner/5346095
>>
>> I'm now getting the error:
>>
>> Error in data.frame(..., check.names = FALSE) :
>>   arguments imply differing number of rows: 2602, 0
>>
>> I'm guessing this has something to do with my ROI file, but I'm having
>> trouble figuring out what to do. I have 2611 ROIs, not 2602. I've
>> double checked that I have no duplicates - these both return 2611:
>>
>> length(unique(roi_file$ID))
>> length(unique(with(roi_file, paste(chr, start, stop))))
>>
>> There may be some overlapping regions, but I'm unsure how to best
>> handle this. Could this be causing the error?
>>
>> Thanks,
>>
>> Stephen
>>
>> On Mon, Apr 8, 2013 at 3:58 PM, Lukas Chavez
>> <lukas.chavez.mailings at googlemail.com> wrote:
>> >
>> > Hi Stephen,
>> >
>> > I was able to reproduce the error. It appears that you have unmapped
>> > reads
>> > in your bam file which we always filter out during conversion of sam to
>> > bam
>> > files using samtools view -S -F 4 -b -o out.bam in.sam (single end
>> > sequening
>> > data). In case you filter your bam files accordingly, the error message
>> > should disappear.  However, I will add an additional flag
>> > scanBamFlag(isUnmappedQuery = F) to the according scanBam() function in
>> > dev.
>> > Therefore, unmapped reads in a given bam file should be ignored in the
>> > future (MEDIPS version >=1.11.1).
>> >
>> > Lukas
>
>



More information about the Bioconductor mailing list