[BioC] Format problems

Claire Wilson ClaireWilson at PICR.man.ac.uk
Thu Aug 14 12:03:55 MEST 2003

Dear all,
This is possibly more of an R question,but because it involves dealing properly with Affy probeset identifiers i'm asking here...

Can anyone explain the rules R uses to replace '_' characters with '.'s? I am finding that columns in data frames are sometimes having there rownames changed from '1007_s_at' to 'X1007.s.at' (for example, lines 3-6 in the excerpt below). I am also seeing rownames that are being repeated (last 2 rownames printed out in lines3-6 in the excerpt below, even though they should be unique. This seems to happen in data frames, but not matrices. I think that it's probably an  internal representation I should never get to see but I'm not sure
for example:

I have 2 data.frames that contain fold changes and pscores for a number of different experiments.  Each data frame has 6 columns fold change 1, p-score 1, fold change 2, p-score 2, fold change 3, p-score 3 and the rownames are probeset identifiers.  I now have a function that takes a pair of columns from each table, looks at what probesets pass a certain p-score and fold change cutoff and which of these probesets are shared by the 2 tables.  My problem is this, for the 1st 2 pairs of columns (fold change 1, p-score 1, fold change 2, p-score 2) everything works fine but when I try and compare columns 5 and 6 from each table, the rownames for certain probesets are changed from the standard format into one where they are prefixed by an X and the '_' replaced by a dot.  Putting in print statements shows this
[1] "1007_s_at" "1053_at"   "117_at"    "121_at"    "1255_g_at" "1294_at" - rownames[1:6] table 1
[2] "1007_s_at" "1053_at"   "117_at"    "121_at"    "1255_g_at" "1294_at" - rownames[1:6] table 2
[3] "1007_s_at 1053_at X1007.s.at X1053.at X1053.at" - rownames[1:6] table 1 that pass a p-score cutoff
[4] "1053_at 121_at X1053.at X117.at X1255.g.at" - rownames[1:6] table 2 that pass a p-score cutoff
[5] "121_at 1320_at X121.at X1294.at X1294.at" - rownames[1:6] table 1 that pass a fold change cutoff
[6] "1729_at 1729_at X1294.at X1316.at X1316.at" - rownames[1:6] table 2 that pass a fold change cutoff

...can anyone help me out with where I am going wrong or has anyone come across similar issues (I am running the latest version of R and the Bioconductor packages).  The data frames passed to the function are made by using cbind to join together different columns from different data frames.

Many thanks

Claire Wilson, PhD
Bioinformatics group  
Paterson Institute for Cancer Research  
Christies Hospital NHS Trust  
Wilmslow Road,  
M20 4BX  
tel: +44 (0)161 446 8218  
url: http://bioinf.picr.man.ac.uk/

This email is confidential and intended solely for the use o...{{dropped}}

More information about the Bioconductor mailing list