[R] Importing data coming from Splus into R.

William Dunlap wdunlap at tibco.com
Fri Feb 5 20:32:36 CET 2010


> -----Original Message-----
> From: gerald.jean at dgag.ca [mailto:gerald.jean at dgag.ca] 
> Sent: Friday, February 05, 2010 10:58 AM
> To: William Dunlap
> Cc: Uwe Ligges; r-help at r-project.org
> Subject: RE: [R] Importing data coming from Splus into R.
> 
> Hello Bill,
> 
> here is what I tried with the Splus built-in data set "claims".
> 
> In Splus:
> 
> apply(claims, 2, class)
>        age   car.age     type      cost    number
>  "ordered" "ordered" "factor" "numeric" "numeric"
> dump(list = "claims",
>      fileout = "/home/jeg002/splus/R/Exemples/R/myclaims.txt",
>      oldStyle = T)  ## I tried both, oldStyle = T and 
> oldStyle = F, same
> results.
> 
> In R:
> 
> claims <- source("/home/jeg002/splus/R/Exemples/R/myclaims.txt")
> apply(claims$value, 2, class)  ## oldStyle = T this time.
>         age     car.age        type        cost      number
> "character" "character" "character" "character" "character"
> 

Use lapply(claims$value, class) instead of
apply(claims$value, 2, class).  In R apply
converts its first argument into a matrix,
which will be a character matrix if any
columns are factors.  In recent versions of
S+ apply(data.frame, MARGIN=2,...) avoids
the convert-to-matrix step and works on the
columns of the data.frame.

In this example it looks like the Splus dump -> R source
route works.

R> lapply(claims$value, class)
$age
[1] "ordered" "factor"

$car.age
[1] "ordered" "factor"

$type
[1] "factor"

$cost
[1] "numeric"

$number
[1] "numeric"

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

> I must admit I had not tried using "write.table" from Splus.  
> I did, now,
> always with the "claims" data set.  On the first attempt R 
> complained of no
> method to change the character variables to the "ordered" 
> class.  I made a
> copy of the data set in Splus, changed the class of two variables from
> "ordered" to "factor" and gave it another try.  Here are the results:
> 
> In Splus:
> 
> new.claims <- claims
> class(new.claims$age) <- "factor"
> class(new.claims$car.age) <- "factor"
> apply(new.claims, 2, class)
>       age  car.age     type      cost    number
>  "factor" "factor" "factor" "numeric" "numeric"
> write.table(data = new.claims,
>             file = "/home/jeg002/splus/R/Exemples/R/myclaims.txt",
>             sep = "@", append = F, quote.strings = T,
>             dimnames.write = T, na = NA, end.of.row = "\n",
>             justify.format = "decimal")
> 
> In R:
> 
> claims.classes <- c("character", "factor", "factor", 
> "factor", "numeric",
>                     "numeric")  ## The first "character" is for the
> row.names
> claims <-
>     read.table(file = "/home/jeg002/splus/R/Exemples/R/myclaims.txt",
>                header = TRUE, sep = "@", quote = "\"", as.is = FALSE,
>                strip.white = FALSE, comment.char = "", 
> na.strings = "NA",
>                nrows = 200, colClasses = claims.classes)
> apply(claims, 2, class)
>   row.names         age     car.age        type        cost   
>    number
> "character" "character" "character" "character" "character" 
> "character"
> 
> 
> I'd be more than happy to supply you a small sample of my 
> data set if the
> built-in "claims" doesn't do the job.
> 
> Thanks for your support,
> 
> Gérald Jean
> Conseiller senior en statistiques,
> VP Planification et Développement des Marchés,
> Desjardins Groupe d'Assurances Générales
> télephone            : (418) 835-4900 poste (7639)
> télecopieur          : (418) 835-6657
> courrier électronique: gerald.jean at dgag.ca
> 
> "In God we trust, all others must bring data"  W. Edwards Deming
> 
> 
> "William Dunlap" <wdunlap at tibco.com> a écrit sur 2010/02/05 12:37:25 :
> 
> > For a data.frame with only numeric and factor
> > columns using dump() on the S+ end and source()
> > on the R end ought to work.  If you have timeDate
> > columns you will need to convert them to character
> > data before exporting and convert them to your
> > favorite R time/date class after importing them.
> >
> > If you could send me a fairly small sample of your
> > data that shows the incompatibility between S+'s
> > write.table and R's read.table I could try to fix
> > things up so they were more compatible.
> >
> > Code that reads the S+ native binary format must
> > be 32/64 bit aware, since S+ integers are 32 bits
> > on 32-bit versions of S+ and 64 bits on 64-bit
> > versions.
> >
> > Bill Dunlap
> > Spotfire, TIBCO Software
> > wdunlap tibco.com
> >
> > > -----Original Message-----
> > > From: r-help-bounces at r-project.org
> > > [mailto:r-help-bounces at r-project.org] On Behalf Of Uwe Ligges
> > > Sent: Friday, February 05, 2010 8:05 AM
> > > To: Gerald Jean
> > > Cc: r-help at r-project.org
> > > Subject: Re: [R] Importing data coming from Splus into R.
> > >
> > > 1. I am stuck with a copy of S-PLUS 4.x. At that time I used
> > > dump() in
> > > S-PLUS and source() to get things into R afterwards ...
> > >
> > > 2. Why do you think that 32-bit vs. 64-bit issues matter? The file
> > > format does not change (well, this is guessed since I do 
> not have any
> > > 64-bit S-PLUS version available).
> > >
> > > Best,
> > > Uwe Ligges
> > >
> > >
> > > On 05.02.2010 16:35, gerald.jean at dgag.ca wrote:
> > > >
> > > > Hello there,
> > > >
> > > > I spent all day yesterday trying to get a small data set
> > > from Splus into R,
> > > > no luck!  Both, Splus and R, are run on a 64-bit RedHat
> > > Linux machine, the
> > > > versions of the softwares are 64-bit and are as what follows:
> > > >
> > > > Splus:
> > > > TIBCO Software Inc. Confidential Information
> > > > Copyright (c) 1988-2008 TIBCO Software Inc. ALL RIGHTS RESERVED.
> > > > TIBCO Spotfire S+ Version 8.1.1 for Linux 2.6.9-34.EL, 
> 64-bit : 2008
> > > >
> > > > R:
> > > > R version 2.8.0 (2008-10-20)
> > > > Copyright (C) 2008 The R Foundation for Statistical Computing
> > > > ISBN 3-900051-07-0
> > > >
> > > > I know that the "foreign" package has a function to
> > > directly import Splus
> > > > data sets into R, but I also know that it is working 
> only for 32-bit
> > > > versions of the softwares, hence I didn't try that route.
> > > Here is what I
> > > > have done:
> > > >
> > > > In Splus:
> > > >
> > > > ttt<- exportData(data = FMD.CR.test,
> > > >                    file =
> > > "/home/jeg002/splus/R/Exemples/R/FMD-CR-test.csv",
> > > >                    type = "ASCII", delimiter = "@", quote =
> > > T, na.string =
> > > > "NA")
> > > > ttt.class<- unlist(lapply(FMD.CR.test, class))
> > > >
> > > > ### I am using "@" as delimiter since some factor levels
> > > contain both the
> > > > "," and the ";".
> > > >
> > > > In R:
> > > >
> > > > FMD.CR.test.fields<- count.fields(file =
> > > > "/home/jeg002/splus/R/Exemples/R/FMD-CR-test.csv",
> > > >                                     sep = "@", quote =
> > > "\"", comment.char =
> > > > "")
> > > > all(FMD.CR.test.fields == 327)
> > > > [1] TRUE  ## Hence all observations have the same number of
> > > fields, so far,
> > > > so good!
> > > >
> > > > FMD.CR.test.classes<- c("factor", "character", 
> "factor", "factor",
> > > > "factor",
> > > >                           "factor", "factor", "factor",
> > > "factor", "factor",
> > > >                           "factor", "numeric", "character",
> > > and so on)
> > > > names(FMD.CR.test.classes)<- c("RTA","police", "mnt.rent.bnct",
> > > >                           "mnt.rent.boni", "mnt.rent.cred.bnct",
> > > >                           "mnt.rent.epar.bnct", "mnt.rent.snbn",
> > > >                           "mnt.rent.trxl", "solde.eop",
> > > "solde.nenr.es",
> > > >                           "solde.enr.es", "num.enreg",
> > > "trouve", and so on)
> > > > FMD.CR.test<-
> > > >      read.table(file =
> > > "/home/jeg002/splus/R/Exemples/R/FMD-CR-test.csv",
> > > >                 header = TRUE, sep = "@", quote = "\"",
> > > as.is = FALSE,
> > > >                 strip.white = FALSE, comment.char = "",
> > > na.strings = "NA",
> > > >                 nrows = 65000, colClasses = FMD.CR.test.classes)
> > > > dim(FMD.CR.test)
> > > > [1] 64093   327  ## OK
> > > >
> > > > ### Testing if classes are the same as the Splus classes.
> > > >
> > > > FMD.CR.test.R.classes<- apply(FMD.CR.test, 2, FUN = class)
> > > > sum(FMD.CR.test.R.classes == FMD.CR.test.classes)
> > > > [1] 79  ## Not exactly what I was expecting!
> > > > all(FMD.CR.test.R.classes == "character")
> > > > [1] TRUE
> > > >
> > > > Hence all variables were imported as character, which I 
> find very
> > > > inconvenient; since the data set has a few hundred 
> factor variables
> > > > recoding them is a lot of work, this work has already been
> > > done in Splus;
> > > > furthermore, the numeric variables would need 
> conversion as well.
> > > >
> > > > I tried all combinations of the arguments "as.is",
> > > "stringsAsFactors" and
> > > > "colClasses" to no avail.  I also tried to export the data
> > > set in SAS
> > > > transport format from Splus and read it through the
> > > foreign's read.xport
> > > > function, always the same result, everything is imported as
> > > character.  I
> > > > search the r-help archives, I found several messages
> > > relating this problem
> > > > but no satisfactory solution!
> > > >
> > > > I am a long time user of Splus and I am planning to use R
> > > more often,
> > > > mainly due to its wealth of packages and the convenience of
> > > installing
> > > > them.  I hope to find a reliable and convivial way of
> > > transferring data
> > > > between the two cousins pieces of software.
> > > >
> > > > Thanks for any insights,
> > > >
> > > > Gérald Jean
> > > > Conseiller senior en statistiques,
> > > > VP Planification et Développement des Marchés,
> > > > Desjardins Groupe d'Assurances Générales
> > > > télephone            : (418) 835-4900 poste (7639)
> > > > télecopieur          : (418) 835-6657
> > > > courrier électronique: gerald.jean at dgag.ca
> > > >
> > > > "In God we trust, all others must bring data"  W. Edwards Deming
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Le message ci-dessus, ainsi que les documents
> > > l'accompagnant, sont destinés
> > > > uniquement aux personnes identifiées et peuvent contenir
> > > des informations
> > > > privilégiées, confidentielles ou ne pouvant être
> > > divulguées. Si vous avez
> > > > reçu ce message par erreur, veuillez le détruire.
> > > >
> > > > This communication ( and/or the attachments ) is 
> intended for named
> > > > recipients only and may contain privileged or confidential
> > > information
> > > > which is not to be disclosed. If you received this
> > > communication by mistake
> > > > please destroy all copies.
> > > >
> > > >
> > > >
> > > >
> > > > Faites bonne impression et imprimez seulement au besoin !
> > > > Think green before you print !
> > > >
> > > > Le message ci-dessus, ainsi que les documents
> > > l'accompagnant, sont destinés uniquement aux personnes
> > > identifiées et peuvent contenir des informations
> > > privilégiées, confidentielles ou ne pouvant être divulguées.
> > > Si vous avez reçu ce message par erreur, veuillez le détruire.
> > > >
> > > > This communication (and/or the attachments) is intended for
> > > named recipients only and may contain privileged or
> > > confidential information which is not to be disclosed. If you
> > > received this communication by mistake please destroy all copies.
> > > >
> > > > ______________________________________________
> > > > R-help at r-project.org mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal, self-contained, 
> reproducible code.
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> 
> 
> 
> Le message ci-dessus, ainsi que les documents l'accompagnant, 
> sont destinés
> uniquement aux personnes identifiées et peuvent contenir des 
> informations
> privilégiées, confidentielles ou ne pouvant être divulguées. 
> Si vous avez
> reçu ce message par erreur, veuillez le détruire.
> 
> This communication ( and/or the attachments ) is intended for named
> recipients only and may contain privileged or confidential information
> which is not to be disclosed. If you received this 
> communication by mistake
> please destroy all copies.
> 
> 
> 
> 
> Faites bonne impression et imprimez seulement au besoin !
> Think green before you print !
> 
> Le message ci-dessus, ainsi que les documents l'accompagnant, 
> sont destinés uniquement aux personnes identifiées et peuvent 
> contenir des informations privilégiées, confidentielles ou ne 
> pouvant être divulguées. Si vous avez reçu ce message par 
> erreur, veuillez le détruire.
> 
> This communication (and/or the attachments) is intended for 
> named recipients only and may contain privileged or 
> confidential information which is not to be disclosed. If you 
> received this communication by mistake please destroy all copies.
> 



More information about the R-help mailing list