[R] Performance issue with attributes

MacQueen, Don macqueen1 at llnl.gov
Wed Mar 12 18:17:24 CET 2014


I know you already have a solution, but your original problem might have
been that you needed

   attributes(Dataset1[[1]]) <-

i.e., double brackets, rather than

   attributes(Dataset1[1]) <-

Example:

> foo <- data.frame(a=1:4, b=factor(letters[4]))

> attributes(foo[[2]])
$class
[1] "factor"

$levels
[1] "d"

 
> attributes(foo[[2]])
$class
[1] "factor"

$levels
[1] "d"


 
-Don

-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 3/11/14 2:18 AM, "Smart Guy" <smartguy3k at gmail.com> wrote:

>Apologies for the late reply. I was out on vacation.
>I tried setattr() from data.table package and it worked like a magic.
>
>Thanks a lot for the help. setattr() is really faster than "attributes".
>
>Regards,
>SG
>
>
>On 22 February 2014 12:29, Philippe Grosjean
><phgrosjean at sciviews.org>wrote:
>
>> You can use setattr() in the data.table package. It can be used too on
>> data.frames or other objects.
>> Best,
>>
>> Philippe Grosjean
>>
>>
>> On 22 Feb 2014, at 03:13, Smart Guy <smartguy3k at gmail.com> wrote:
>>
>> > Hi All
>> >
>> > I am having problem running the 'attributes' command to set a
>>attribute
>> on
>> > each column of a large dataset. Dataset has 80 columns and 312407
>>rows.
>> Its
>> > taking more than 60 seconds to set simple attributes like split=TRUE,
>> > usermissing=FALSE.
>> >
>> > Here is the source code, assuming Dataset1 is the one that is large :-
>> >
>> > myfunction <- function()
>> > {
>> > cat("Before for loop:")
>> > print(Sys.time())
>> > for( colIndex in 1 : 80)
>> > {
>> > cat("Before Attr", colIndex)
>> > print(Sys.time())
>> >
>> > attributes(Dataset1[1]) <- c(attributes(Dataset1[, colIndex]),
>> list(coldesc
>> > = c(), usermissing = c(FALSE), missingvalues  = NULL, split =
>>c(FALSE),
>> > levelLabels = c("")))
>> >
>> > cat("After Attr:")
>> > print(Sys.time())
>> > }
>> > cat("After for loop:")
>> > print(Sys.time())
>> > }
>> >
>> > Its my feeling that R is passing all 312407 rows to set 'attributes'
>>on a
>> > cloumn.
>> >
>> > Is there a more efficent way to do this?
>> >
>> >
>> > Thanks,
>> > SG
>> >
>> >       [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>>
>
>
>-- 
>SG
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list