[R] Problem with number characters

Mike Prager Mike.Prager at noaa.gov
Fri Oct 15 17:46:59 CEST 2004


Scott,

Has anyone suggested yet that some options might need adjusting in the 
Windows program that wrote the file?  In my experience, CSV files from 
Windows applications are typically pure ASCII (though yours clearly isn't).

Another possibility is that the program can export in plain ASCII with the 
fields in set positions.  Sometimes that can be done by capturing output 
intended for basic printers.  Either of those approaches might be simpler 
than trying to fiddle with the quirky output provided.  Reverse engineering 
is error prone.

MHP


At 10/15/2004 10:47 AM Friday, you wrote:

>Note that there are also regexp classes that define certain character
>sets, most notably [:graph:] , which can make it easy to create
>appropriate regexps.  More is in ?regex .
>
>Martin Maechler <maechler <at> stat.math.ethz.ch> writes:
>
>:
>: >>>>> "Spencer" == Spencer Graves <spencer.graves <at> pdf.com>
>: >>>>>     on Thu, 14 Oct 2004 13:41:24 -0700 writes:
>:
>:     Spencer>   It looks like you have several non-printing
>:     Spencer> characters.  "nchar" will give you the total number
>:     Spencer> of characters in each character string.
>:
>:     Spencer> "strsplit" can break character strings into single
>:     Spencer> characters, and "%in%" can be used to classify
>:     Spencer> them.
>:
>: and you give nice coding examples:
>:
>:     Spencer> Consider the following:
>:     >> x <- "Draszt 0%/1ÂÂÂÂ?iso8859-15³"
>:     >> nx <- nchar(x)
>:     >> x. <- strsplit(x, "")
>:     >> length(x.[[1]])
>:     Spencer> [1] 29
>:     >>
>:     >> namechars <- c(letters, LETTERS, as.character(0:9), ".")
>:
>: just to be precise:  If 'namechars' is supposed to mean
>: ``characters valid in R object names'', then you should have
>: added "_" as well:
>:
>: namechars <- c(letters, LETTERS, as.character(0:9), ".", "_")
>:
>:     >> punctuation <- c(",", "!", "+", "*", "&", "|")
>:     >> legalchars <- c(namechars, punctuation)
>:
>: and 'legalchars' would have to contain quite a bit more I
>: presume, e.g. "$", " <at> ", ....
>: (but that wouldn't have been a reason to write this e-mail..)
>:
>:     >> legalx <- lapply(x., function(y)(y %in% legalchars))
>:     >> x.[[1]][!legalx[[1]]]
>:     Spencer> [1] " " "" "%" "/" "Â" "Â" "Â" "Â?" "-" "" "Â" "³"
>:     >>
>:     >> sapply(legalx, sum)
>:     Spencer> [1] 17
>:
>:     Spencer> Will this give you ideas about what to do what you want?
>:     Spencer> hope this helps. spencer graves
>:
>: (and this too)
>:
>: Martin Maechler, ETH Zurich
>:
>:
>:     Spencer> Gabor Grothendieck wrote:
>:
>:     >> Assuming that the problem is that your input file has
>:     >> additional embedded characters added by the data base
>:     >> program you could try extracting just the text using
>:     >> the UNIX strings program:
>:     >>
>:     >> strings myfile.csv > myfile.txt
>:     >>
>:     >> and see if myfile.txt works with R and if not check out
>:     >> what the differences are between it and the .csv file.
>:     >>
>:     >> Date:   Thu, 14 Oct 2004 11:31:33 -0700
>:     >> From:   Scott Waichler <scott.waichler <at> pnl.gov>
>:     >> To:   <r-help <at> stat.math.ethz.ch>
>:     >> Subject:   [R] Problem with number characters
>:     >>
>:     >>
>:     >> I am trying to process text fields scanned in from a csv file that is
>:     >> output from the Windows database program FileMakerPro. The characters
>:     >> onscreen look like regular text, but R does not like their 
>underlying
>binary form.
>:     >> For example, one of text fields contains a name and a number, but
>:     >> R recognizes the number as something other than what it appears
>:     >> to be in plain text. The character string "Draszt 03" after being
>:     >> read into R using scan and ="" becomes "Draszt 03" where the 3 is
>:     >> displayed in my R session as a superscript. Here is the result pasted
>:     >> into this email I'm composing in emacs: "Draszt 0%/1ÂÂÂÂ?iso8859-
>15³"
>:     >> Another clue for the knowledgable: when I try to display the vector
>element
>:     >> causing trouble, I get
>:     >> <CHARSXP: "Draszt 0%/1ÂÂÂÂ?iso8859-15³">
>:     >> where again the superscipt part is just "3" in my R session. I'm
>working in
>:     >> Linux, R version 1.9.1, 2004-06-21. Your help will be much
>appreciated.
>:     >>
>:     >> Scott Waichler
>:     >> Pacific Northwest National Laboratory
>:     >> scott.waichler <at> pnl.gov
>:
>: ______________________________________________
>: R-help <at> stat.math.ethz.ch mailing list
>: https://stat.ethz.ch/mailman/listinfo/r-help
>: PLEASE do read the posting guide! 
>http://www.R-project.org/posting-guide.html
>:
>:
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


-- 
Michael Prager
NOAA Center for Coastal Fisheries and Habitat Research
Beaufort, North Carolina  28516  USA
http://shrimp.ccfhrb.noaa.gov/~mprager/

NOTE: Opinions expressed are personal, not official. No government
endorsement of any product is made or implied.




More information about the R-help mailing list