[R] RegExp question

David Winsemius dwinsemius at comcast.net
Wed Jun 16 20:03:37 CEST 2010


On Jun 16, 2010, at 1:05 PM, Andrej wrote:

> Sorry, I apologize. Below is the minimal example.
>
> library(RWeka)
> model <- J48(as.factor(Species)~., data = iris)
>> model
> J48 pruned tree
> ------------------
>
> Petal.Width <= 0.6: setosa (50.0)
> Petal.Width > 0.6
> |   Petal.Width <= 1.7
> |   |   Petal.Length <= 4.9: versicolor (48.0/1.0)
> |   |   Petal.Length > 4.9
> |   |   |   Petal.Width <= 1.5: virginica (3.0)
> |   |   |   Petal.Width > 1.5: versicolor (3.0/1.0)
> |   Petal.Width > 1.7: virginica (46.0/1.0)
>
> Number of Leaves  : 	5
>
> Size of the tree : 	9
>
> So, the task is to extract the number of leases.

methods(print)  # print.Weka_classifier has asterisk
getAnywhere(print.Weka_classifier)

The task performed by print.Weka_classifier is handled outside of R in  
the invisible Java call.

str(model)

As a mostly uninformed guess after looking at the output of  
str(model),  you might get satisfaction with:

 > nrow(attr(model$terms, "factors"))
[1] 5

But I am not really sure because the documentation is so sketchy about  
the structure of the returned object. It appears the authors expect a  
lot more knowledge about Weka trees than I have.

-- 
David

>
> Andrej
>
> On Jun 16, 6:58 pm, David Winsemius <dwinsem... at comcast.net> wrote:
>> Publicly produce something we can work with. I have no idea how to
>> create an example that will match such an object.
>>
>> ?dput
>> ?dump
>>
>> Read Posting Guide.
>> --
>> David.
>>
>> On Jun 16, 2010, at 12:54 PM, Andrej wrote:
>>
>>
>>
>>> Thanks David for your fast reply, but now I realized tat "string" is
>>> of type:
>>
>>>> class(string)
>>> [1] "jobjRef"
>>> attr(,"package")
>>> [1] "rJava"
>>
>>> so I get an error when i try with gsub or sub:
>>
>>>> sub("^.+\\t(\\d+)\\n.+$", "\\1", string)
>>> Error in as.character.default(x) :
>>>  no method for coercing this S4 class to a vector
>>
>>> I think that there should be trivial solution, but... Any further
>>> idea?
>>
>>> Regards, Andrej
>>
>>> On Jun 16, 6:47 pm, David Winsemius <dwinsem... at comcast.net> wrote:
>>>> On Jun 16, 2010, at 12:04 PM, Andrej wrote:
>>
>>>>> Dear all,
>>
>>>>> I'm trying to filter out the "number of leaves" (it should be 1 in
>>>>> the
>>>>> example below) from the following string:
>>
>>>>>> string
>>>>> [1] "Java-Object{J48 pruned tree\n------------------\n: 0
>>>>> (15.0/3.0)\n
>>>>> \nNumber of Leaves  : \t1\n\nSize of the tree : \t1\n}"
>>
>>>>> Any idea how to do that as simple as possible? Thanks in advance  
>>>>> for
>>>>> any advice.
>>
>>>> ?sub   # or ?gsub if you need more than one pattern matched (they  
>>>> are
>>>> on the same page).
>>
>>>> This should find the first occurrence of digits following a tab
>>>> terminated by a line feed and then return only the digits:
>>
>>>> string <- "Java-Object{J48 pruned tree\n------------------\n: 0
>>>> (15.0/3.0)\n \nNumber of Leaves  : \t1\n\nSize of the tree :  
>>>> \t1\n}"
>>>> sub("^.+\\t(\\d+)\\n.+$", "\\1", string)
>>>> [1] "1"
>>
>>>> The parens within the search pattern are matched to "\\1". Need to
>>>> double backslashed within patterns.
>>
>>>>> Regards, Andrej
>>
>>>> --
>>
>>>> David Winsemius, MD
>>>> West Hartford, CT
>>
>>>> ______________________________________________
>>>> R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/
>>>> listinfo/r-help
>>>> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>> ______________________________________________
>>> R-h... at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius, MD
>> West Hartford, CT
>>
>> ______________________________________________
>> R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/ 
>> listinfo/r-help
>> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list