[R] Problem with number characters

Spencer Graves spencer.graves at pdf.com
Thu Oct 14 22:41:24 CEST 2004


  It looks like you have several non-printing characters.

"nchar" will give you the total number of characters in each character 
string.

"strsplit" can break character strings into single characters, and 
"%in%" can be used to classify them.

Consider the following:
 > x <- "Draszt 0%/1ÂÂ?iso8859-15³"
 > nx <- nchar(x)
 > x. <- strsplit(x, "")
 > length(x.[[1]])
[1] 29
 >
 > namechars <- c(letters, LETTERS,
+ as.character(0:9), ".")
 > punctuation <- c(",", "!", "+", "*", "&", "|")
 > legalchars <- c(namechars, punctuation)
 >
 > legalx <- lapply(x., function(y)(y %in% legalchars))
 > x.[[1]][!legalx[[1]]]
[1] " " "" "%" "/" "Â" "" "Â" "?" "-" "" "Â" "³"
 >
 > sapply(legalx, sum)
[1] 17

Will this give you ideas about what to do what you want?
hope this helps. spencer graves

Gabor Grothendieck wrote:

>Assuming that the problem is that your input file has 
>additional embedded characters added by the data base
>program you could try extracting just the text using
>the UNIX strings program:
>
>   strings myfile.csv > myfile.txt
>
>and see if myfile.txt works with R and if not check out
>what the differences are between it and the .csv file.
>
>Date:   Thu, 14 Oct 2004 11:31:33 -0700 
>From:   Scott Waichler <scott.waichler at pnl.gov>
>To:   <r-help at stat.math.ethz.ch> 
>Subject:   [R] Problem with number characters 
>
> 
>I am trying to process text fields scanned in from a csv file that is
>output from the Windows database program FileMakerPro. The characters
>onscreen look like regular text, but R does not like their underlying binary form.
>For example, one of text fields contains a name and a number, but
>R recognizes the number as something other than what it appears
>to be in plain text. The character string "Draszt 03" after being
>read into R using scan and ="" becomes "Draszt 03" where the 3 is 
>displayed in my R session as a superscript. Here is the result pasted
>into this email I'm composing in emacs: "Draszt 0%/1ÂÂ?iso8859-15³"
>Another clue for the knowledgable: when I try to display the vector element
>causing trouble, I get
><CHARSXP: "Draszt 0%/1ÂÂ?iso8859-15³">
>where again the superscipt part is just "3" in my R session. I'm working in
>Linux, R version 1.9.1, 2004-06-21. Your help will be much appreciated.
>
>Scott Waichler
>Pacific Northwest National Laboratory
>scott.waichler at pnl.gov
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>  
>

-- 
Spencer Graves, PhD, Senior Development Engineer
O:  (408)938-4420;  mobile:  (408)655-4567




More information about the R-help mailing list