[BioC] matchPDict with mismatches allowed appears to drop names

Harris A. Jaffee hj at jhu.edu
Tue Aug 2 20:48:32 CEST 2011


>> So it appears the names are not inherited.  I tried to added them  
>> manually to my MIndex object
	names(txmatches1)<-names(probeset)

>> but I get Error:
>> attempt to modify the names of a ByPos_MIndex instance

 From MIndex-class.R:

setMethod("names", "MIndex", function(x) x at NAMES)

setReplaceMethod("names", "MIndex",
     function(x, value)
         stop("attempt to modify the names of a ", class(x), "  
instance")
)


For now, a pretty safe workaround for you is

	txmatches1 at NAMES <- names(probeset)

	
On Aug 2, 2011, at 11:45 AM, Harris A. Jaffee wrote:

> The short answer for the fuzzy matching case appears to be that, even
> though your names are maintained through most of the function
>
> 	matchPDict.R:.match.MTB_PDict,
>
> and in fact they are still present in the list 'ans_compons', they get
> dropped in this call:
>
> 	ans <- ByPos_MIndex.combine(ans_compons)
>
> By contrast, for exact matching, the function
>
> 	matchPDict.R:.match.TB_PDict
>
> returns your names here:
>
> 	new("ByPos_MIndex", width0=width(pdict), NAMES=names(pdict),  
> ends=C_ans)
>
> So my naive guess is that
>
> 	MIndex-class.R:ByPos_MIndex.combine
>
> should do that also, something like
>
> 	ans_names <-  mi_list[[1]]@NAMES
> 	new("ByPos_MIndex", width0=ans_width0, ends=ans_ends,  
> NAMES=ans_names)
>
> On Aug 2, 2011, at 5:24 AM, Ian Henry wrote:
>
>> Hi,
>>
>> I have a question regarding the inheritance of the names attribute  
>> when using matchPDict.
>>
>> If I use matchPDict as follows:
>>
>> #Get transcript information
>> > hg19txdb <- makeTranscriptDbFromUCSC(genome = "hg19", tablename  
>> = "refGene")
>> > hg19_tx <- extractTranscriptsFromGenome(Hsapiens, hg19txdb)
>>
>> #Create DNAStringSet with names associated with each probe
>> > probeset <- DNAStringSet(probelist$sequence)
>> > names(probeset)<-probelist$probenames
>>
>> #Create PDict object and match against human transcript 14 (I know  
>> it should match)
>> > ps_pdict<-PDict(probeset)
>> > txmatches <- matchPDict(ps_pdict, hg19_tx[[14]])
>>
>> this compares the probes in ps_pdict to transcript 14 in hg19 and  
>> gives:
>> >unlist(txmatches):
>>
>>     start end width           names
>> [1]   749 773    25  HW:6
>> [2]   569 593    25 HW:16
>> [3]   804 828    25 HW:26
>> [4]   757 781    25 HW:36
>>
>> which works :)
>>
>> However, if I search allowing for mismatches then the names appear  
>> to be lost:
>>
>> > ps_pdict1<-PDict(probeset, max.mismatch=1)
>> > txmatches1 <- matchPDict(ps_pdict1, hg19_tx[[14]],  
>> max.mismatch=1, min.mismatch=0)
>> > unlist(txmatches1)
>>
>> IRanges of length 4
>>     start end width
>> [1]   749 773    25
>> [2]   569 593    25
>> [3]   804 828    25
>> [4]   757 781    25
>>
>> The result of matchPDict is a MIndex object that I named txmatches  
>> with exact matches, and txmatches1 with 1 mismatch
>> > names(txmatches)                #gives character vector  
>> containing probe names
>> > names(txmatches1)              #returns NULL
>>
>> So it appears the names are not inherited.  I tried to added them  
>> manually to my MIndex object
>> >names(txmatches1)<-names(probeset)
>>
>> but I get Error:
>> attempt to modify the names of a ByPos_MIndex instance
>>
>> Therefore I'm not sure how to keep my probe names associated with  
>> the Transcript match, which is important for inexact matching.
>>
>> Any help would be greatly appreciated,
>>
>> Thanks,
>>
>> Ian
>>
>>
>> >sessionInfo()
>>
>> R version 2.13.0 beta (2011-03-31 r55221)
>> Platform: i386-apple-darwin9.8.0/i386 (32-bit)
>>
>> locale:
>> [1] C/UTF-8/C/C/C/C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] plyr_1.5.2                          
>> BSgenome.Hsapiens.UCSC.hg19_1.3.17
>> [3] BSgenome_1.19.5                    Biostrings_2.19.17
>> [5] GenomicFeatures_1.3.15             GenomicRanges_1.3.31
>> [7] IRanges_1.9.28
>>
>> loaded via a namespace (and not attached):
>> [1] Biobase_2.11.10     DBI_0.2-5           RCurl_1.5-0
>> [4] RSQLite_0.9-4       XML_3.2-0           biomaRt_2.7.1
>> [7] rtracklayer_1.11.12 tools_2.13.0
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/ 
>> gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/ 
> gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list