[BioC] Genbank to Unigene IDs

John Zhang jzhang at jimmy.harvard.edu
Fri Apr 16 15:33:11 CEST 2004


Sorry, the example code should be 

> ids <- c("AC010642", "AF414429", "X56654", "Y08432")
> ids2ll <-   
as.matrix(read.table("ftp://ftp.ncbi.nih.gov/refseq/LocusLink/loc2acc", header = 
FALSE, sep = "\t"))
# We only need the first and second column
> ids2ll <- ids2ll[, c(1, 2)]
> colnames(ids2ll) <- c("LL", "GB")
># Drop the version number
> ids2ll[,2] <- gsub("\\..*", "", ids2ll[,2])
> mapped <- ids2ll[is.element(ids2ll[,2], ids),]
> mapped 
      LL       GB        
1     "     1" "AC010642"
4     "     1" "AF414429"
10671 "  1828" "X56654"  
10677 "  1830" "Y08432"


>I think the most direct way of getting the ids maped is to use sources 
available 
>at LocusLink(ftp://ftp.ncbi.nih.gov/refseq/LocusLink). If your target file 
>contains GenBank accession numbers (e. g. "AC010642", "AC010642", ...), read 
>ftp://ftp.ncbi.nih.gov/refseq/LocusLink/loc2acc using read.table (sep = "\t") 
>and then do a matching. If your target file contains RefSeq ids (e. g. 
>"NM_130786",	"NM_000014", ...), read 
>ftp://ftp.ncbi.nih.gov/refseq/LocusLink/loc2ref, instead. An example:
>
>> ids <- c("AC010642", "AF414429", "X56654", "Y08432")
>> ids2ll <-   
>as.matrix(read.table("ftp://ftp.ncbi.nih.gov/refseq/LocusLink/loc2acc", header 
= 
>FALSE, sep = "\t", strip.white = TRUE))
># We only need the second and third column
>> ids2ll <- ids2ll[, c(2, 3)]
>> colnames(ids2ll) <- c("GB", "LL")
># Drop the version number
>> ids2ll[,1] <- gsub("\\..*", "", ids2ll[,1])
>> mapped <- ids2ll[is.element(ids2ll[,1], ids),]
>> mapped 
>      GB         LL        
>1     "AC010642" "-"       
>4     "AF414429" "15778556"
>10671 "X56654"   "30506"   
>10677 "Y08432"   "-"
>
>
>
>>
>>Thanks a lot
>>Gordon
>>
>>_______________________________________________
>>Bioconductor mailing list
>>Bioconductor at stat.math.ethz.ch
>>https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>
>Jianhua Zhang
>Department of Biostatistics
>Dana-Farber Cancer Institute
>44 Binney Street
>Boston, MA 02115-6084
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor

Jianhua Zhang
Department of Biostatistics
Dana-Farber Cancer Institute
44 Binney Street
Boston, MA 02115-6084



More information about the Bioconductor mailing list