[R] Problem with data distribution

John Fox j|ox @end|ng |rom mcm@@ter@c@
Thu Feb 17 20:21:39 CET 2022


Dear Nega gupta,

On 2022-02-17 1:54 p.m., Neha gupta wrote:
> Hello everyone
> 
> I have a dataset with output variable "bug" having the following values (at
> the bottom of this email). My advisor asked me to provide data distribution
> of bugs with 0 values and bugs with more than 0 values.
> 
> data = readARFF("synapse.arff")
> data2 = readARFF("synapse.arff")
> data$bug
> library(tidyverse)
> data %>%
>    filter(bug == 0)
> data2 %>%
>    filter(bug >= 1)
> boxplot(data2$bug, data$bug, range=0)
> 
> But both the graphs are exactly the same, how is it possible? Where I am
> doing wrong?

As it turns out, you're doing several things wrong.

First, you're not using pipes and filter() correctly. That is, you don't 
do anything with the filtered versions of the data sets. You're 
apparently under the incorrect impression that filtering modifies the 
original data set.

Second, you're greatly complicating a simple problem. You don't need to 
read the data twice and keep two versions of the data set. As well, 
processing the data with pipes and filter() is entirely unnecessary. The 
following code works:

    with(data, boxplot(bug[bug == 0], bug[bug >= 1], range=0))

Third, and most fundamentally, the parallel boxplots you're apparently 
trying to construct don't really make sense. The first "boxplot" is just 
a horizontal line at 0 and so conveys no information. Why not just plot 
the nonzero values if that's what you're interested in?

Fourth, you didn't share your data in a convenient form. I was able to 
reconstruct them via

   bug <- scan()
   0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 1 0 0 0 0 0
   0 4 1 0
   0 1 0 0 0 0 0 0 1 0 3 2 0 0 0 0 3 0 0 0 0 2 0 0 0 1 0 0 0 0 1 1 1 0 0
   0 0 0 0
   1 1 2 1 0 1 0 0 0 2 2 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 5 0 0 0 0 0 0
   7 0 0 1
   0 1 1 0 2 0 3 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 3 2 1 1 0 0 0 0 0 0
   0 1 0 0
   0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 0 0 1 0 1 3 0 0 0 0 0 0 0 0 1 0 4 1 1 0
   0 0 0 1
   0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0 0 0 0

   data <- data.frame(bug)

Finally, it's better not to post to the list in plain-text email, rather 
than html (as the posting guide suggests).

I hope this helps,
  John

> 
> 
> data$bug
>    [1] 0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 1 0 0 0 0 0
> 0 4 1 0
>   [40] 0 1 0 0 0 0 0 0 1 0 3 2 0 0 0 0 3 0 0 0 0 2 0 0 0 1 0 0 0 0 1 1 1 0 0
> 0 0 0 0
>   [79] 1 1 2 1 0 1 0 0 0 2 2 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 5 0 0 0 0 0 0
> 7 0 0 1
> [118] 0 1 1 0 2 0 3 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 3 2 1 1 0 0 0 0 0 0
> 0 1 0 0
> [157] 0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 0 0 1 0 1 3 0 0 0 0 0 0 0 0 1 0 4 1 1 0
> 0 0 0 1
> [196] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0 0 0 0
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/



More information about the R-help mailing list