[R] Problem with data distribution

Neha gupta neh@@bo|ogn@90 @end|ng |rom gm@||@com
Thu Feb 17 20:42:01 CET 2022


Ebert and Rui, thank you for providing the tips (in fact, for providing the
answer I needed).

Yes, you are right that boxplot of all zero values will not make sense.
Maybe histogram will work.

I am providing a few details of my data here and the context of the
question I asked.

My data is about bugs/defects in different classes of a large software
system. I have to predict which class will contain bugs and which will be
free of bugs (bug=0). I trained ML models and predict but my advisor asked
me to provide first the data distribution about bugs e.g details of how
many classes with bugs (bug > 0) and how many are free of bugs (bug=0).

That is why I need to provide the data distribution of both types of values
(i.e. bug=0 and bug >0)

Thank you again.

On Thu, Feb 17, 2022 at 8:28 PM Rui Barradas <ruipbarradas using sapo.pt> wrote:

> Hello,
>
> In your original post you read the same file "synapse.arff" twice,
> apparently to filter each of them by its own criterion. You don't need
> to do that, read once and filter that one by different criteria.
>
> As for the data as posted, I have read it in with the following code:
>
>
> x <- "
> 0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 1 0 0 0 0 0 0
> 4 1 0
> 0 1 0 0 0 0 0 0 1 0 3 2 0 0 0 0 3 0 0 0 0 2 0 0 0 1 0 0 0 0 1 1 1 0 0 0
> 0 0 0
> 1 1 2 1 0 1 0 0 0 2 2 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 5 0 0 0 0 0 0 7
> 0 0 1
> 0 1 1 0 2 0 3 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 3 2 1 1 0 0 0 0 0 0 0
> 1 0 0
> 0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 0 0 1 0 1 3 0 0 0 0 0 0 0 0 1 0 4 1 1 0 0
> 0 0 1
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0 0 0 0
> "
> bug <- scan(text = x)
> data <- data.frame(bug)
>
>
> This is not the right way to post data, the posting guide asks to post
> the output of
>
>
> dput(data)
> structure(list(bug = c(0, 1, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0,
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 1, 0, 0, 0, 0,
> 0, 0, 4, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 3, 2, 0, 0, 0, 0,
> 3, 0, 0, 0, 0, 2, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0,
> 0, 0, 1, 1, 2, 1, 0, 1, 0, 0, 0, 2, 2, 1, 1, 0, 0, 0, 0, 0, 0,
> 1, 0, 0, 1, 0, 0, 1, 0, 0, 5, 0, 0, 0, 0, 0, 0, 7, 0, 0, 1, 0,
> 1, 1, 0, 2, 0, 3, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0,
> 0, 1, 0, 3, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
> 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 3, 0, 0, 1, 0, 1, 3, 0, 0, 0, 0,
> 0, 0, 0, 0, 1, 0, 4, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 1, 0, 0, 0, 0, 0)),
> class = "data.frame", row.names = c(NA, -222L))
>
>
>
> This can be copied into an R session and the data set recreated with
>
> data <- structure(etc)
>
>
> Now the boxplots.
>
> (Why would you want to plot a vector of all zeros, btw?)
>
>
>
> library(dplyr)
>
> boxplot(filter(data, bug == 0))    # nonsense
> boxplot(filter(data, bug > 0), range = 0)
>
> # Another way
> data %>%
>    filter(bug > 0) %>%
>    boxplot(range = 0)
>
>
> Hope this helps,
>
> Rui Barradas
>
>
> Às 19:03 de 17/02/2022, Neha gupta escreveu:
> > That is all the code I have. How can I provide a  reproducible code ?
> >
> > How can I save this result?
> >
> > On Thu, Feb 17, 2022 at 8:00 PM Ebert,Timothy Aaron <tebert using ufl.edu>
> wrote:
> >
> >> You pipe the filter but do not save the result. A reproducible example
> >> might help.
> >> Tim
> >>
> >> -----Original Message-----
> >> From: R-help <r-help-bounces using r-project.org> On Behalf Of Neha gupta
> >> Sent: Thursday, February 17, 2022 1:55 PM
> >> To: r-help mailing list <r-help using r-project.org>
> >> Subject: [R] Problem with data distribution
> >>
> >> [External Email]
> >>
> >> Hello everyone
> >>
> >> I have a dataset with output variable "bug" having the following values
> >> (at the bottom of this email). My advisor asked me to provide data
> >> distribution of bugs with 0 values and bugs with more than 0 values.
> >>
> >> data = readARFF("synapse.arff")
> >> data2 = readARFF("synapse.arff")
> >> data$bug
> >> library(tidyverse)
> >> data %>%
> >>    filter(bug == 0)
> >> data2 %>%
> >>    filter(bug >= 1)
> >> boxplot(data2$bug, data$bug, range=0)
> >>
> >> But both the graphs are exactly the same, how is it possible? Where I am
> >> doing wrong?
> >>
> >>
> >> data$bug
> >>    [1] 0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 1 0 0
> 0 0 0
> >> 0 4 1 0
> >>   [40] 0 1 0 0 0 0 0 0 1 0 3 2 0 0 0 0 3 0 0 0 0 2 0 0 0 1 0 0 0 0 1 1
> 1 0 0
> >> 0 0 0 0
> >>   [79] 1 1 2 1 0 1 0 0 0 2 2 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 5 0 0 0
> 0 0 0
> >> 7 0 0 1
> >> [118] 0 1 1 0 2 0 3 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 3 2 1 1 0 0 0 0
> 0 0
> >> 0 1 0 0
> >> [157] 0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 0 0 1 0 1 3 0 0 0 0 0 0 0 0 1 0 4 1
> 1 0
> >> 0 0 0 1
> >> [196] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0 0 0 0
> >>
> >>          [[alternative HTML version deleted]]
> >>
> >> ______________________________________________
> >> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=TZx8pDTF9x1Tu4QZW3x_99uu9RowVjAna39KcjCXSElI1AOk1C_6L2pR8YIVfiod&s=NxfkBJHBnd8naYPQTd9Z8dZ2m-RCwh_lpGvHVQ8MwYQ&e=
> >> PLEASE do read the posting guide
> >>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=TZx8pDTF9x1Tu4QZW3x_99uu9RowVjAna39KcjCXSElI1AOk1C_6L2pR8YIVfiod&s=exznSElUW1tc6ajt0C8uw5cR8ZqwHRD6tUPAarFYdYo&e=
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list