[R] Problem reading from a data frame

Marc Schwartz marc_schwartz at comcast.net
Wed Jul 2 18:52:25 CEST 2008


Not likely the factor issue:

x <- factor(c("MT2342",    "MT0982",    "MT2874"))

 > x
[1] MT2342 MT0982 MT2874
Levels: MT0982 MT2342 MT2874

 > gsub("[^0-9]", "", x)
[1] "2342" "0982" "2874"


gsub() and friends coerce to character internally already:

 > gsub
function (pattern, replacement, x, ignore.case = FALSE, extended = TRUE,
     perl = FALSE, fixed = FALSE, useBytes = FALSE)
{
     if (!is.character(x))
         x <- as.character(x)
     .Internal(gsub(as.character(pattern), as.character(replacement),
         x, ignore.case, extended, perl, fixed, useBytes))
}
<environment: namespace:base>



More than likely what is happening is that 'PthwyGenes' is a single row 
data frame:

x <- data.frame(A = "MT2342", B = "MT0982", C = "MT2874")

 > x
        A      B      C
1 MT2342 MT0982 MT2874

 > str(x)
'data.frame':	1 obs. of  3 variables:
  $ A: Factor w/ 1 level "MT2342": 1
  $ B: Factor w/ 1 level "MT0982": 1
  $ C: Factor w/ 1 level "MT2874": 1


Thus, when the code for gsub() attempts to coerce 'x' to character, as 
per documented behavior, you get the factor level numeric codes coerced 
to character:

 > as.character(x[1, ])
[1] "1" "1" "1"


and then:


 > gsub("[^0-9]", "", x[1, ])
[1] "1" "1" "1"


Thus, instead use:

 > sapply(x[1, ], function(x) gsub("[^0-9]", "", x))
      A      B      C
"2342" "0982" "2874"


or, if you just need the vector returned and not a data frame:


 > gsub("[^0-9]", "", unlist(x[1, ]))
[1] "2342" "0982" "2874"


The key thing to remember is that a single extracted row in a data frame 
is not a vector.

HTH,

Marc Schwartz


on 07/02/2008 10:51 AM jim holtman wrote:
> Seems to work fine for me:
> 
>> x <- c("MT2342",    "MT0982",    "MT2874")
>> gsub("[^0-9]", "", x)
> [1] "2342" "0982" "2874"
> 
> You might have 'factors' so you should use as.character to convert to
> character strings:
> 
> gsub('[^0-9]','',as.character(PthwyGenes))
> 
> On Wed, Jul 2, 2008 at 10:24 AM,  <naw3 at duke.edu> wrote:
>> Hi,
>>
>> I have a data frame with strings that have two letters and four numbers. When I
>> store a whole row as a new vector and try to remove the preceding letters using
>> the gsub command, it returns characters of single numbers that have no relation
>> to the numbers in each string. I also noticed that when I view the new vector
>> before using gsub, it includes the original headers from the data frame. For
>> example,
>>
>> The original row will contain (i'm not showing the headers):
>>
>> MT2342    MT0982    MT2874
>>
>> and after I use the command, 'gsub('[^0-9]','',PthwyGenes),' I get:
>>
>> "6"    "6"    "8"
>>
>> and this result no longer has any headers.
>>
>> Does anyone know why this happens and how I can fix it?
>>
>> Thanks,
>> -Nina



More information about the R-help mailing list