[BioC] changing gpr files?

Furge, Kyle Kyle.Furge at vai.org
Wed Apr 8 20:35:41 CEST 2009


Yes. A similar strategy has worked for me in the past...attached is some old code that may be helpful.

The ann.file referenced in the following code is a simple tab delimited file that holds the old identifier in the first column and the new identifier in the second.

something like

VAI.ID    SEQ.ID    Refseq.ID    Entrez.Gene    Symbol    Band    Description
300008    702770914    "XM_536923.1"    479795    "LOC479795"    ""    "similar to eukaryotic translation initiation factor 3, subunit 8, 110kDa"
300009    702770914    "XM_536923.1"    479795    "LOC479795"    ""    "similar to eukaryotic translation initiation factor 3, subunit 8, 110kDa"
300010    702770334    "XM_537565.1"    480447    "LOC480447"    ""    "similar to zinc finger, FYVE domain containing 21"
300011    702770334    "XM_537565.1"    480447    "LOC480447"    ""    "similar to zinc finger, FYVE domain containing 21"
300012    702771982    "XM_535904.1"    478737    "LOC478737"    ""    "similar to TRAF and TNF receptor-associated protein"
300013    702771982    "XM_535904.1"    478737    "LOC478737"    ""    "similar to TRAF and TNF receptor-associated protein"

-------

readGPRHeader <- function (file)  {
  con <- file(file, "r")
  if (substring(readLines(con, n = 1), 1, 3) != "ATF")
    stop("File is not in Axon Text File (ATF) format")
  nfields <- as.numeric(strsplit(readLines(con, n = 1), split = "\t")[[1]])
  close(con)
  nfields[1]
}

ann.file <- "./VAI.ann.rel17.txt"

if(length(ann.file) == 0)
  stop("cannot find annotation file")
ann <- read.delim(ann.file,as.is=T)

path <- "."
files <- dir(path,pattern="_u.gpr")
for(f in files) {
  cat("Reading ",f,"\n" );
  skip <- readGPRHeader(f)+3
  con <- file(f,"r")
  header <- readLines(con,n=skip)
  gpr <- read.delim(f,skip=skip,as.is=T,header=F,colClasses="character")
  close(con)

  cat("Updating ",f,"\n")

  vids <- gpr[,5]
  ix <- match(vids,ann$VAI.ID)

  acc <- ann$SEQ.ID[ix]
  refseq <- ann$Refseq.ID[ix]
  gene <- ann$Entrez.Gene[ix]

  new.name <- paste(ann$Description[ix],paste('[',acc,",",refseq,",",gene,']',sep=''))

  gpr[,4] <- new.name
  new.filename <- file.path(path,gsub('\.gpr','\.rel17.gpr',f))
  out <- file(new.filename,"at")
  write.table(header,file=out,row.names=F,col.names=F,quote=F)
  write.table(gpr,file=out,row.names=F,col.names=F,quote=c(4,5),sep="\t")
  close(out)
}




On 4/8/09 2:00 PM, "Jenny Drnevich" <drnevich at illinois.edu> wrote:

Hi everyone,

I was hoping someone had some quick suggestions for me: a client has
hundreds of gpr files of the same microarray from many experiments
over many years. Which clone is in which spot hasn't changed, only
what they call the clone in the ID and Name slot. I've got the
final(?) naming scheme, and they want me to update all the .gpr files
to have the same ID and Name columns. I don't have much time to
figure this out on my own, so I was hoping one of you could help me!
This is what I was thinking needed to be done, but I'm not sure how
to implement it off the top of my head.

1. Read in / open a connection to a .gpr file
2. Replace the Name and ID column, keeping all the header info the
same  (could replace entire Block,Row,Column,Name,ID if it's easier)
3. Save with a new file name.

THANKS!!
Jenny

Jenny Drnevich, Ph.D.

Functional Genomics Bioinformatics Specialist
W.M. Keck Center for Comparative and Functional Genomics
Roy J. Carver Biotechnology Center
University of Illinois, Urbana-Champaign

330 ERML
1201 W. Gregory Dr.
Urbana, IL 61801
USA

ph: 217-244-7355
fax: 217-265-5066
e-mail: drnevich at illinois.edu

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


This email message, including any attachments, is for th...{{dropped:6}}



More information about the Bioconductor mailing list