[BioC] Format problems

Claire Wilson ClaireWilson at PICR.man.ac.uk
Thu Aug 14 12:03:55 MEST 2003


Dear all,
This is possibly more of an R question,but because it involves dealing properly with Affy probeset identifiers i'm asking here...

Can anyone explain the rules R uses to replace '_' characters with '.'s? I am finding that columns in data frames are sometimes having there rownames changed from '1007_s_at' to 'X1007.s.at' (for example, lines 3-6 in the excerpt below). I am also seeing rownames that are being repeated (last 2 rownames printed out in lines3-6 in the excerpt below, even though they should be unique. This seems to happen in data frames, but not matrices. I think that it's probably an  internal representation I should never get to see but I'm not sure
for example:

I have 2 data.frames that contain fold changes and pscores for a number of different experiments.  Each data frame has 6 columns fold change 1, p-score 1, fold change 2, p-score 2, fold change 3, p-score 3 and the rownames are probeset identifiers.  I now have a function that takes a pair of columns from each table, looks at what probesets pass a certain p-score and fold change cutoff and which of these probesets are shared by the 2 tables.  My problem is this, for the 1st 2 pairs of columns (fold change 1, p-score 1, fold change 2, p-score 2) everything works fine but when I try and compare columns 5 and 6 from each table, the rownames for certain probesets are changed from the standard format into one where they are prefixed by an X and the '_' replaced by a dot.  Putting in print statements shows this
[1] "1007_s_at" "1053_at"   "117_at"    "121_at"    "1255_g_at" "1294_at" - rownames[1:6] table 1
[2] "1007_s_at" "1053_at"   "117_at"    "121_at"    "1255_g_at" "1294_at" - rownames[1:6] table 2
[3] "1007_s_at 1053_at X1007.s.at X1053.at X1053.at" - rownames[1:6] table 1 that pass a p-score cutoff
[4] "1053_at 121_at X1053.at X117.at X1255.g.at" - rownames[1:6] table 2 that pass a p-score cutoff
[5] "121_at 1320_at X121.at X1294.at X1294.at" - rownames[1:6] table 1 that pass a fold change cutoff
[6] "1729_at 1729_at X1294.at X1316.at X1316.at" - rownames[1:6] table 2 that pass a fold change cutoff

...can anyone help me out with where I am going wrong or has anyone come across similar issues (I am running the latest version of R and the Bioconductor packages).  The data frames passed to the function are made by using cbind to join together different columns from different data frames.

Many thanks

Claire
--
Claire Wilson, PhD
Bioinformatics group  
Paterson Institute for Cancer Research  
Christies Hospital NHS Trust  
Wilmslow Road,  
Withington  
Manchester  
M20 4BX  
tel: +44 (0)161 446 8218  
url: http://bioinf.picr.man.ac.uk/
 
--------------------------------------------------------

 
This email is confidential and intended solely for the use o...{{dropped}}



More information about the Bioconductor mailing list