[Rd] Objects created by more than one data call?

Spencer Graves spencer.graves at prodsyse.com
Wed May 22 23:40:50 CEST 2013


Dear Bill:


On 5/22/2013 2:12 PM, William Dunlap wrote:
> I used svn to copy the current version of Ecdat from Rforge to my PC
>    C:\temp\packages>svn checkout svn://r-forge.r-project.org/svnroot/ecdat/
> then fired up R to look at the rda files in it.
>
>> setwd("c:/temp/packages/Ecdat/ecdat/pkg/data")
>> read.dcf("../DESCRIPTION")[, c("Package","Version")]
> Package Version
> "Ecdat" "0.2-3"
>> dir.rda <- function(rdaFile) { e <- new.env() ; load(rdaFile, envir=e) ; objects(e, all=TRUE)}
>> dir.rda("VietNamH.rda")
> [1] "MedExp"
>> rdas <- dir(pattern="\\.rda$")
>> names(rdas) <- rdas
>> z <- lapply(rdas, dir.rda)
>> tab <- table(unlist(z))
>> tab[tab>1]
> Hstarts  MedExp
>        3       2
>> z[sapply(z, function(zi)"Hstarts" %in% zi)]
> $Hstarts.rda
> [1] "Hstarts"
>
> $Intratesm.rda
> [1] "Hstarts"
>
> $Intratesq.rda
> [1] "Hstarts"
>
>> z[sapply(z, function(zi)"MedExp" %in% zi)]
> $MedExp.rda
> [1] "MedExp"
>
> $VietNamH.rda
> [1] "MedExp"
>
> It looks some files don't contain what their names suggest:
>> dir.rda("VietNamH.rda")
> [1] "MedExp"
>
> The two versions of MedExp are quite different:
>> load("VietNamH.rda", envViet <- new.env(parent=emptyenv()))
>> load("MedExp.rda", envMed <- new.env(parent=emptyenv()))
>> objects(envViet)
> [1] "MedExp"
>> objects(envMed)
> [1] "MedExp"
>> all.equal(envViet$MedExp, envMed$MedExp)
>   [1] "Names: 11 string mismatches"
>   [2] "Length mismatch: comparison on first 11 components"
>   [3] "Component 1: 'current' is not a factor"
>   ...
> [18] "Component 10: Numeric: lengths (5999, 5574) differ"
> [19] "Component 11: 'current' is not a factor"


       Thanks very much.  Now I understand the problem.  For a solution, 
I need to consult with the package author (Yves Croissant;  I'm only the 
maintainer).


       However, your work at least makes it easy for me to describe the 
problem to him.


       Thanks again.
       Spencer

> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>
>> -----Original Message-----
>> From: Spencer Graves [mailto:spencer.graves at prodsyse.com]
>> Sent: Wednesday, May 22, 2013 1:27 PM
>> To: William Dunlap
>> Cc: r-devel at r-project.org
>> Subject: Re: [Rd] Objects created by more than one data call?
>>
>> On 5/21/2013 3:03 PM, William Dunlap wrote:
>>> If you look at
>>>      data(package="Ecat")$results[,"Item"]
>>> you will see the items "Hstarts", "Hstarts (Intratesm)", and "Hstarts (Intratesq)"
>>> which I think means that the dataset Hstarts is found in 3 .rda files, "Hstarts.rda",
>>> "Intratesq.rda", and "Intratesm.rda".  There are duplicate, modulo (filename),
>>> items for "MedExp" as well.
>>
>>         Thanks for this.  I may get me closer, but I still don't see it:
>> (data(Intratesm)) imports only the object Intratesm, etc.  For more
>> details, see below.
>>
>>
>>         Any other suggestions?
>>
>>
>>         Thanks again,
>>         Spencer
>>
>>
>>   > Ecdat.data <- data(package="Ecdat")$results
>>   > (Hstarts2 <- grep('Hstarts', Ecdat.data[, 'Item']))
>> [1] 47 48 49
>>   > (MedExp2 <- grep('MedExp', Ecdat.data[, 'Item']))
>> [1] 67 68
>>   > Ecdat.data[Hstarts2, ]
>>        Package LibPath Item
>> [1,] "Ecdat" "C:/Users/sgraves/pgms/R/R-3.0.0/library" "Hstarts"
>> [2,] "Ecdat" "C:/Users/sgraves/pgms/R/R-3.0.0/library" "Hstarts (Intratesm)"
>> [3,] "Ecdat" "C:/Users/sgraves/pgms/R/R-3.0.0/library" "Hstarts (Intratesq)"
>>        Title
>> [1,] "Housing Starts"
>> [2,] "Housing Starts"
>> [3,] "Housing Starts"
>>   > Ecdat.data[MedExp2,]
>>        Package LibPath Item
>> [1,] "Ecdat" "C:/Users/sgraves/pgms/R/R-3.0.0/library" "MedExp"
>> [2,] "Ecdat" "C:/Users/sgraves/pgms/R/R-3.0.0/library" "MedExp (VietNamH)"
>>        Title
>> [1,] "Structure of Demand for Medical Care"
>> [2,] "Structure of Demand for Medical Care"
>>   > library(Ecdat)
>>   > (data(Intratesm))
>> [1] "Intratesm"
>>   > (data(Intratesq))
>> [1] "Intratesq"
>>
>>
>>> Bill Dunlap
>>> Spotfire, TIBCO Software
>>> wdunlap tibco.com
>>>
>>>
>>>> -----Original Message-----
>>>> From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org] On
>> Behalf
>>>> Of Spencer Graves
>>>> Sent: Tuesday, May 21, 2013 12:21 PM
>>>> To: Prof Brian Ripley
>>>> Cc: r-devel at r-project.org
>>>> Subject: Re: [Rd] Objects created by more than one data call?
>>>>
>>>> On 5/21/2013 9:03 AM, Prof Brian Ripley wrote:
>>>>> On 21/05/2013 16:51, Spencer Graves wrote:
>>>>>> On 5/21/2013 7:47 AM, Prof Brian Ripley wrote:
>>>>>>> On 21/05/2013 15:28, Spencer Graves wrote:
>>>>>>>> On 5/20/2013 10:10 PM, Prof Brian Ripley wrote:
>>>>>>>>> On 21/05/2013 00:12, Spencer Graves wrote:
>>>>>>>>>> Hello, All:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>          If I use LazyData with the Ecdat package on R-Forge, "R CMD
>>>>>>>>>> check" reports "no visible binding for global variable
>>>>>>>>>> 'nonEnglishNames'", where 'nonEnglishNames' is a dataset in Ecdat
>>>>>>>>>> used
>>>>>>>>>> as the default argument for a function.  With LazyData, that NOTE
>>>>>>>>>> disappears.  However, then I get, "Warning: objects 'Hstarts',
>>>>>>>>>> 'Hstarts', 'MedExp' are created by more than one data call".
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>          What do you suggest I do to fix this problem?
>>>>>>>>> Not create the objects in more than one data() call.
>>>>>>>>>
>>>>>>>>> Check what each of your data() calls produces.
>>>>>>>>          Thanks.  How do I do that?
>>>>>>> Call data() on each in turn, and see what files get added to an empty
>>>>>>> workspace.
>>>>>> Like the following?
>>>>> You missed the 'empty'.  Look at tools:::data2LazyLoadDB to see how
>>>>> this is checked.
>>>>          Thanks for the suggestion.  Unfortunately, I tried that function,
>>>> including stepping through it line by line, fixing references to other
>>>> functions not exported from tools, without enlightenment;  see below.
>>>>
>>>>
>>>>          Thanks again,
>>>>          Spencer
>>>>
>>>>
>>>>    > lib.loc = NULL
>>>>    > package='Ecdat'
>>>>    > pkgpath <- find.package(package, lib.loc, quiet = TRUE)
>>>>    > pkgpath
>>>> [1] "C:/Users/sgraves/pgms/R/R-3.0.0/library/Ecdat"
>>>>    > dataDir <- file.path(pkgpath, "data")
>>>>    > dataDir
>>>> [1] "C:/Users/sgraves/pgms/R/R-3.0.0/library/Ecdat/data"
>>>>    > enc <- tools:::.read_description(file.path(pkgpath,
>>>> "DESCRIPTION"))["Encoding"]
>>>>    > enc
>>>> <NA>
>>>>      NA
>>>>    > if (!is.na(enc)) {
>>>> +         op <- options(encoding = enc)
>>>> +         on.exit(options(encoding = op[[1L]]))
>>>> +     }
>>>>    > file_test("-d", dataDir)
>>>> [1] TRUE
>>>>    > file.path(dataDir, "Rdata.rds")
>>>> [1] "C:/Users/sgraves/pgms/R/R-3.0.0/library/Ecdat/data/Rdata.rds"
>>>>    > (file.exists(file.path(dataDir, "Rdata.rds")) &&
>>>> file.exists(file.path(dataDir,
>>>> +             paste(package, "rdx", sep = "."))) &&
>>>> file.exists(file.path(dataDir,
>>>> +             paste(package, "rdb", sep = "."))))
>>>> [1] FALSE
>>>>    > file.exists(file.path(dataDir,
>>>> +             paste(package, "rdx", sep = ".")))
>>>> [1] FALSE
>>>>    > file.path(dataDir,
>>>> +             paste(package, "rdx", sep = "."))
>>>> [1] "C:/Users/sgraves/pgms/R/R-3.0.0/library/Ecdat/data/Ecdat.rdx"
>>>>    > dataEnv <- new.env(hash = TRUE)
>>>>    > tmpEnv <- new.env()
>>>>    > f0 <- files <- list_files_with_type(dataDir, "data")
>>>> Error: could not find function "list_files_with_type"
>>>>    > f0 <- files <- tools:::list_files_with_type(dataDir, "data")
>>>>    > files <- unique(basename(file_path_sans_ext(files,
>>>> +                 TRUE)))
>>>> Error in basename(file_path_sans_ext(files, TRUE)) :
>>>>      could not find function "file_path_sans_ext"
>>>>    > files <- unique(basename(tools:::file_path_sans_ext(files,
>>>> +                 TRUE)))
>>>>    >             dlist <- vector("list", length(files))
>>>>    > files
>>>> character(0)
>>>>    >  names(dlist) <- files
>>>>    >             loaded <- character(0L)
>>>>    > loaded
>>>> character(0)
>>>>    > for (f in files) {
>>>> +                 utils::data(list = f, package = package, lib.loc =
>>>> lib.loc,
>>>> +                   envir = dataEnv)
>>>> +                 utils::data(list = f, package = package, lib.loc =
>>>> lib.loc,
>>>> +                   envir = tmpEnv)
>>>> +                 tmp <- ls(envir = tmpEnv, all.names = TRUE)
>>>> +                 rm(list = tmp, envir = tmpEnv)
>>>> +                 dlist[[f]] <- tmp
>>>> +                 loaded <- c(loaded, tmp)
>>>> +             }
>>>>    >             dup <- duplicated(loaded)
>>>>    > dup
>>>> logical(0)
>>>>    > if (any(dup))
>>>> +                 warning(sprintf(ngettext(sum(dup), "object %s is
>>>> created by more than one data call",
>>>> +                   "objects %s are created by more than one data call"),
>>>> +                   paste(sQuote(loaded[dup]), collapse = ", ")),
>>>> +                   call. = FALSE, domain = NA)
>>>>    >             if (length(loaded)) {
>>>> +                 dbbase <- file.path(dataDir, "Rdata")
>>>> +                 makeLazyLoadDB(dataEnv, dbbase, compress = compress)
>>>> +                 saveRDS(dlist, file.path(dataDir, "Rdata.rds"),
>>>> +                   compress = compress)
>>>> +                 unlink(f0)
>>>> +                 if (file.exists(file.path(dataDir, "filelist")))
>>>> +                   unlink(file.path(dataDir, c("filelist", "Rdata.zip")))
>>>> +             }
>>>>    >
>>>>
>>>>>>    > library(Ecdat)
>>>>>>    > objects()
>>>>>> character(0)
>>>>>>    > (data(Hstarts))
>>>>>> [1] "Hstarts"
>>>>>>    > (data(MedExp))
>>>>>> [1] "MedExp"
>>>>>>    > objects()
>>>>>> [1] "Hstarts" "MedExp"
>>>>>>    > sessionInfo()
>>>>>> R version 3.0.0 (2013-04-03)
>>>>>> Platform: i386-w64-mingw32/i386 (32-bit)
>>>>>>
>>>>>> locale:
>>>>>> [1] LC_COLLATE=English_United States.1252
>>>>>> [2] LC_CTYPE=English_United States.1252
>>>>>> [3] LC_MONETARY=English_United States.1252
>>>>>> [4] LC_NUMERIC=C
>>>>>> [5] LC_TIME=English_United States.1252
>>>>>>
>>>>>> attached base packages:
>>>>>> [1] stats     graphics  grDevices utils     datasets  methods base
>>>>>>
>>>>>> other attached packages:
>>>>>> [1] Ecdat_0.2-3
>>>>>>
>>>>>> loaded via a namespace (and not attached):
>>>>>> [1] tools_3.0.0
>>>>>>
>>>>>>
>>>>>>          Thanks,
>>>>>>          Spencer
>>>>>>
>>>>>>>>          In the "man" directory, I just did "grep 'data(MedExp' *.Rd",
>>>>>>>> which identified only "MedExp.Rd:\usage{data(MedExp)}"; "grep
>>>>>>>> 'data(Hstarts *.Rd" similarly returned only
>>>>>>>> "Hstarts.Rd:\usage(data(Hstarts)}".
>>>>>>>>
>>>>>>>>
>>>>>>>>          Thanks again for the reply.
>>>>>>>>          Spencer
>>>>>>>>>>          Thanks,
>>>>>>>>>>          Spencer Graves
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    > sessionInfo()
>>>>>>>>>> R version 3.0.0 (2013-04-03)
>>>>>>>>>> Platform: i386-w64-mingw32/i386 (32-bit)
>>>>>>>>>>
>>>>>>>>>> locale:
>>>>>>>>>> [1] LC_COLLATE=English_United States.1252
>>>>>>>>>> [2] LC_CTYPE=English_United States.1252
>>>>>>>>>> [3] LC_MONETARY=English_United States.1252
>>>>>>>>>> [4] LC_NUMERIC=C
>>>>>>>>>> [5] LC_TIME=English_United States.1252
>>>>>>>>>>
>>>>>>>>>> attached base packages:
>>>>>>>>>> [1] stats     graphics  grDevices utils     datasets methods base
>>>>>>>>>>
>>>>>>>>>> other attached packages:
>>>>>>>>>> [1] Ecdat_0.2-3
>>>>>>>>>>
>>>>>>>>>> loaded via a namespace (and not attached):
>>>>>>>>>> [1] tools_3.0.0
>>>>>>>>>>
>>>>>>>>>> ______________________________________________
>>>>>>>>>> R-devel at r-project.org mailing list
>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list