[BioC] matchPDict with mismatches allowed appears to drop names

Ian Henry henry at mpi-cbg.de
Tue Aug 2 11:24:57 CEST 2011


Hi,

I have a question regarding the inheritance of the names attribute  
when using matchPDict.

If I use matchPDict as follows:

#Get transcript information
 > hg19txdb <- makeTranscriptDbFromUCSC(genome = "hg19", tablename =  
"refGene")
 > hg19_tx <- extractTranscriptsFromGenome(Hsapiens, hg19txdb)

#Create DNAStringSet with names associated with each probe
 > probeset <- DNAStringSet(probelist$sequence)
 > names(probeset)<-probelist$probenames

#Create PDict object and match against human transcript 14 (I know it  
should match)
 > ps_pdict<-PDict(probeset)
 > txmatches <- matchPDict(ps_pdict, hg19_tx[[14]])

this compares the probes in ps_pdict to transcript 14 in hg19 and gives:
 >unlist(txmatches):

     start end width           names
[1]   749 773    25  HW:6
[2]   569 593    25 HW:16
[3]   804 828    25 HW:26
[4]   757 781    25 HW:36

which works :)

However, if I search allowing for mismatches then the names appear to  
be lost:

 > ps_pdict1<-PDict(probeset, max.mismatch=1)
 > txmatches1 <- matchPDict(ps_pdict1, hg19_tx[[14]], max.mismatch=1,  
min.mismatch=0)
 > unlist(txmatches1)

IRanges of length 4
     start end width
[1]   749 773    25
[2]   569 593    25
[3]   804 828    25
[4]   757 781    25

The result of matchPDict is a MIndex object that I named txmatches  
with exact matches, and txmatches1 with 1 mismatch
 > names(txmatches)                #gives character vector containing  
probe names
 > names(txmatches1)              #returns NULL

So it appears the names are not inherited.  I tried to added them  
manually to my MIndex object
 >names(txmatches1)<-names(probeset)

but I get Error:
attempt to modify the names of a ByPos_MIndex instance

Therefore I'm not sure how to keep my probe names associated with the  
Transcript match, which is important for inexact matching.

Any help would be greatly appreciated,

Thanks,

Ian


 >sessionInfo()

R version 2.13.0 beta (2011-03-31 r55221)
Platform: i386-apple-darwin9.8.0/i386 (32-bit)

locale:
[1] C/UTF-8/C/C/C/C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] plyr_1.5.2                          
BSgenome.Hsapiens.UCSC.hg19_1.3.17
[3] BSgenome_1.19.5                    Biostrings_2.19.17
[5] GenomicFeatures_1.3.15             GenomicRanges_1.3.31
[7] IRanges_1.9.28

loaded via a namespace (and not attached):
[1] Biobase_2.11.10     DBI_0.2-5           RCurl_1.5-0
[4] RSQLite_0.9-4       XML_3.2-0           biomaRt_2.7.1
[7] rtracklayer_1.11.12 tools_2.13.0



More information about the Bioconductor mailing list