[BioC] Problem with makeDBPackage for arabidopsis with Refseq identifiers

Marc Carlson mcarlson at fhcrc.org
Sat Jan 8 00:26:18 CET 2011


Hi Asta,

I don't understand why you are trying to make a custom package.  The
TAIR IDs that you are matching on to refseq IDs in your map file are
already what you should need to get gene information.  That is, if your
initial IDs are already TAIR IDs (as they appear to be), then you should
be able to just use the org.At.tair.db package.  No custom package to
map a probe ID to a refseq should be required to match the tair IDs to
information of interest (as the org package will already contain that
information).

The one thing that it appears you will have to do to make use of your
IDs is to remove the -TAIR-G suffix from the IDs.  So for example,
AT1G01010-TAIR-G  will become AT1G01010.

So then you could load the package I mentioned:

library(org.At.tair.db)
foo = c("AT1G01010","AT1G01020","AT1G01030")
mget(foo, org.At.tairREFSEQ, ifnotfound=NA)

$AT1G01010
[1] "NM_099983" "NP_171609"

$AT1G01020
[1] "NM_001035846" "NM_099984"    "NP_001030923" "NP_171610"

$AT1G01030
[1] "NM_099985" "NP_171611"

Etc.


  Marc




On 01/07/2011 05:24 AM, Asta Laiho wrote:
> Hi,
>
> I'm trying to create a custom annotation package for arabidopsis using RefSeq identifiers with AnnotationDbi's function makeDBPackage. Everything runs smoothly but the generated package is empty with 0 mappings for everything and I cannot figure out what I'm doing wrong.
>
> My map file looks like:
>
> AT1G01010-TAIR-G        NM_099983
> AT1G01020-TAIR-G        NM_001035846
> AT1G01030-TAIR-G        NM_099985
> AT1G01040-TAIR-G        NM_099986
> AT1G01046-TAIR-G        NR_022015
> AT1G01050-TAIR-G        NM_099987
> AT1G01060-TAIR-G        NM_001035847
> AT1G01070-TAIR-G        NM_099989
> AT1G01073-TAIR-G        NM_001160824
> AT1G01080-TAIR-G        NM_099990
> ...
>
> My function call  is the following:
>
> makeDBPackage(
>   schema="ARABIDOPSISCHIP_DB",
>   affy=FALSE, 
>   prefix="arabidopsis.anno", 
>   fileName="doc/map_reseq.txt",
>   #baseMapType=baseMapType, ## id type present in the fileName gb/ug/eg/refseq/gbNRef
>   outputDir="doc/arabidopsis.anno", 
>   version="1.0.0", 
>   chipName="arabidopsis"
> )
>
> One strange thing here is that if I try to use parameter baseMapType I get an error message complaining about an unnecessary parameter. So how can I be sure the function assumes the correct database type?
>
> # after installing and loading the package:
>
> qcdata <- capture.output(arabidopsis.anno())  
>   
>> qcdata
>>     
>  [1] "Quality control information for arabidopsis.anno:"             
>  [2] ""                                                              
>  [3] ""                                                              
>  [4] "This package has the following mappings:"                      
>  [5] ""                                                              
>  [6] "arabidopsis.annoACCNUM has 0 mapped keys (of 28130 keys)"      
>  [7] "arabidopsis.annoARACYC has 0 mapped keys (of 28130 keys)"      
>  [8] "arabidopsis.annoARACYCENZYME has 0 mapped keys (of 28130 keys)"
>  [9] "arabidopsis.annoCHR has 0 mapped keys (of 28130 keys)"         
> [10] "arabidopsis.annoCHRLENGTHS has 7 mapped keys (of 7 keys)"      
> [11] "arabidopsis.annoCHRLOC has 0 mapped keys (of 28130 keys)"      
> [12] "arabidopsis.annoCHRLOCEND has 0 mapped keys (of 28130 keys)"   
> [13] "arabidopsis.annoENZYME has 0 mapped keys (of 28130 keys)"      
> [14] "arabidopsis.annoENZYME2PROBE has 0 mapped keys (of 631 keys)"  
> [15] "arabidopsis.annoGENENAME has 0 mapped keys (of 28130 keys)"    
> [16] "arabidopsis.annoGO has 0 mapped keys (of 28130 keys)"          
> [17] "arabidopsis.annoGO2ALLPROBES has 0 mapped keys (of 5861 keys)" 
> [18] "arabidopsis.annoGO2PROBE has 0 mapped keys (of 3962 keys)"     
> [19] "arabidopsis.annoPATH has 0 mapped keys (of 28130 keys)"        
> [20] "arabidopsis.annoPATH2PROBE has 0 mapped keys (of 120 keys)"    
> [21] "arabidopsis.annoPMID has 0 mapped keys (of 28130 keys)"        
> [22] "arabidopsis.annoPMID2PROBE has 0 mapped keys (of 13671 keys)"  
> [23] "arabidopsis.annoSYMBOL has 0 mapped keys (of 28130 keys)"      
> [24] ""                                                              
> [25] ""                                                              
> [26] "Additional Information about this package:"                    
> [27] ""                                                              
> [28] "DB schema: ARABIDOPSISCHIP_DB"                                 
> [29] "DB schema version: 2.1"                                        
> [30] "Organism: Arabidopsis thaliana"                                
> [31] "Date for NCBI data: 2010-Mar1"                                 
> [32] "Date for GO data: 20100320"                                    
> [33] "Date for KEGG data: 2010-Feb28"                                
> [34] "Data for TAIR data: 2010-Mar12"       
>
> and the session info:
>   
>> sessionInfo()
>>     
> R version 2.11.1 (2010-05-31) 
> i386-apple-darwin9.8.0 
>
> locale:
> [1] C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base     
>
> other attached packages:
> [1] org.At.tair.db_2.4.3  RSQLite_0.9-2         DBI_0.2-5             arabidopsis.db0_2.4.3 AnnotationDbi_1.10.2 
> [6] Biobase_2.8.0  
>
> I'd be very grateful for any advice!
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list