[R] merging corpora and metadata

R. Michael Weylandt michael.weylandt at gmail.com
Fri Nov 18 03:34:19 CET 2011


Hi Josh,

You're absolutely right. I suppose one could set up some sort of S3
thing for Henri's problem:

c <- function(..., recursive = FALSE) UseMethod("c")
c.default <- base::c
c.corpus <- function(..., recursive = FALSE) {ans = c.default(...);
attributes(ans) <- c(do.call(attributes, ...))}

But agreed, it seems deeply risky.

Cheers,

Michael

On Thu, Nov 17, 2011 at 9:01 PM, Joshua Wiley <jwiley.psych at gmail.com> wrote:
> Hi Michael,
>
> require(sos)
> findFn("{meta}", sortby = "Function")
> ## see that only two functions have the exact name, 'meta'
> ## one is titled, "Meta Data Management" in the package 'tm'
> ## seems a pretty likely choice
>
> Also, the fact that it is a truly terrible idea does not mean it is not easy:
>
> mvir <- new.env()
> mvir$c <- function(x, ...) {cat("sure you can!\n"); mean(x, ...)}
> attach(mvir)
>
> c(x = 1:10)
> detach(mvir)
>
> rm(mvir)
>
> Cheers,
>
> Josh
>
>
> On Thu, Nov 17, 2011 at 5:25 PM, R. Michael Weylandt
> <michael.weylandt at gmail.com> <michael.weylandt at gmail.com> wrote:
>> What package is all this from()?
>>
>> You might check if there is a special rbind/cbind method provided. I don't think you can easily change the behavior of c()
>>
>> Michael
>>
>> On Nov 17, 2011, at 4:43 PM, Henri-Paul Indiogine <hindiogine at gmail.com> wrote:
>>
>>> Greetings!
>>>
>>> I loose all my metadata after concatenating corpora. This is an
>>> example of what happens:
>>>
>>>> meta(corpus.1)
>>>   MetaID cid fid selfirst selend                         fname
>>> 1       0   1  11     2169   2518    WCPD-2001-01-29-Pg217.scrb
>>> 2       0   1  14     9189   9702     WCPD-2003-01-13-Pg39.scrb
>>> 3       0   1  14     2109   2577     WCPD-2003-01-13-Pg39.scrb
>>>
>>> ....
>>> ....
>>>
>>> 17      0   1 114    17863  18256    WCPD-2007-04-30-Pg515.scrb
>>>
>>>
>>>> meta(corpus.2)
>>>   MetaID cid fid selfirst selend                         fname
>>> 1       0   2   2    11016  11600           DCPD-200900595.scrb
>>> 2       0   2   6    19510  20098           DCPD-201000636.scrb
>>> 3       0   2   6    23935  24573           DCPD-201000636.scrb
>>>
>>> ....
>>> ....
>>>
>>> 94      0   2 127    16225  17128   WCPD-2009-01-12-Pg22-3.scrb
>>>
>>>
>>>> tot.corpus <- c(corpus.1, corpus.2)
>>>> meta(tot.corpus)
>>>
>>>    MetaID
>>> 1        0
>>> 2        0
>>> 3        0
>>>
>>> ....
>>> ....
>>>
>>> 111      0
>>>>
>>>
>>> This is from the structure of corpus.1
>>>
>>> ..$ MetaData:List of 2
>>>  .. ..$ create_date: POSIXlt[1:1], format: "2011-11-17 21:09:57"
>>>  .. ..$ creator    : chr "henk"
>>>  ..$ Children: NULL
>>>  ..- attr(*, "class")= chr "MetaDataNode"
>>> - attr(*, "DMetaData")='data.frame':    17 obs. of  6 variables:
>>>  ..$ MetaID  : num [1:17] 0 0 0 0 0 0 0 0 0 0 ...
>>>  ..$ cid     : int [1:17] 1 1 1 1 1 1 1 1 1 1 ...
>>>  ..$ fid     : int [1:17] 11 14 14 17 46 80 80 80 91 91 ...
>>>  ..$ selfirst: num [1:17] 2169 9189 2109 8315 9439 ...
>>>  ..$ selend  : num [1:17] 2518 9702 2577 8881 10102 ...
>>>  ..$ fname   : chr [1:17] "WCPD-2001-01-29-Pg217.scrb"
>>> "WCPD-2003-01-13-Pg39.scrb" "WCPD-2003-01-13-Pg39.scrb"
>>> "WCPD-2004-05-17-Pg856.scrb" ...
>>> - attr(*, "class")= chr [1:3] "VCorpus" "Corpus" "list"
>>>
>>>
>>> Any idea on what I could do to keep the metadata in the merged corpus?
>>>
>>> Thanks,
>>> Henri-Paul
>>>
>>>
>>> --
>>> Henri-Paul Indiogine
>>>
>>> Curriculum & Instruction
>>> Texas A&M University
>>> TutorFind Learning Centre
>>>
>>> Email: hindiogine at gmail.com
>>> Skype: hindiogine
>>> Website: http://people.cehd.tamu.edu/~sindiogine
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Joshua Wiley
> Ph.D. Student, Health Psychology
> Programmer Analyst II, ATS Statistical Consulting Group
> University of California, Los Angeles
> https://joshuawiley.com/
>



More information about the R-help mailing list