[R] Problem with data distribution

John Fox j|ox @end|ng |rom mcm@@ter@c@
Thu Feb 17 20:27:25 CET 2022


Dear Nega gupta,

In the last point, I meant to say, "Finally, it's better to post to the 
list in plain-text email, rather than html (as the posting guide 
suggests)." (I accidentally inserted a "not" in this sentence.)

Sorry,
  John

On 2022-02-17 2:21 p.m., John Fox wrote:
> Dear Nega gupta,
> 
> On 2022-02-17 1:54 p.m., Neha gupta wrote:
>> Hello everyone
>>
>> I have a dataset with output variable "bug" having the following 
>> values (at
>> the bottom of this email). My advisor asked me to provide data 
>> distribution
>> of bugs with 0 values and bugs with more than 0 values.
>>
>> data = readARFF("synapse.arff")
>> data2 = readARFF("synapse.arff")
>> data$bug
>> library(tidyverse)
>> data %>%
>>    filter(bug == 0)
>> data2 %>%
>>    filter(bug >= 1)
>> boxplot(data2$bug, data$bug, range=0)
>>
>> But both the graphs are exactly the same, how is it possible? Where I am
>> doing wrong?
> 
> As it turns out, you're doing several things wrong.
> 
> First, you're not using pipes and filter() correctly. That is, you don't 
> do anything with the filtered versions of the data sets. You're 
> apparently under the incorrect impression that filtering modifies the 
> original data set.
> 
> Second, you're greatly complicating a simple problem. You don't need to 
> read the data twice and keep two versions of the data set. As well, 
> processing the data with pipes and filter() is entirely unnecessary. The 
> following code works:
> 
>     with(data, boxplot(bug[bug == 0], bug[bug >= 1], range=0))
> 
> Third, and most fundamentally, the parallel boxplots you're apparently 
> trying to construct don't really make sense. The first "boxplot" is just 
> a horizontal line at 0 and so conveys no information. Why not just plot 
> the nonzero values if that's what you're interested in?
> 
> Fourth, you didn't share your data in a convenient form. I was able to 
> reconstruct them via
> 
>    bug <- scan()
>    0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 1 0 0 0 0 0
>    0 4 1 0
>    0 1 0 0 0 0 0 0 1 0 3 2 0 0 0 0 3 0 0 0 0 2 0 0 0 1 0 0 0 0 1 1 1 0 0
>    0 0 0 0
>    1 1 2 1 0 1 0 0 0 2 2 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 5 0 0 0 0 0 0
>    7 0 0 1
>    0 1 1 0 2 0 3 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 3 2 1 1 0 0 0 0 0 0
>    0 1 0 0
>    0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 0 0 1 0 1 3 0 0 0 0 0 0 0 0 1 0 4 1 1 0
>    0 0 0 1
>    0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0 0 0 0
> 
>    data <- data.frame(bug)
> 
> Finally, it's better not to post to the list in plain-text email, rather 
> than html (as the posting guide suggests).
> 
> I hope this helps,
>   John
> 
>>
>>
>> data$bug
>>    [1] 0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 1 0 0 
>> 0 0 0
>> 0 4 1 0
>>   [40] 0 1 0 0 0 0 0 0 1 0 3 2 0 0 0 0 3 0 0 0 0 2 0 0 0 1 0 0 0 0 1 1 
>> 1 0 0
>> 0 0 0 0
>>   [79] 1 1 2 1 0 1 0 0 0 2 2 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 5 0 0 0 
>> 0 0 0
>> 7 0 0 1
>> [118] 0 1 1 0 2 0 3 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 3 2 1 1 0 0 0 
>> 0 0 0
>> 0 1 0 0
>> [157] 0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 0 0 1 0 1 3 0 0 0 0 0 0 0 0 1 0 4 
>> 1 1 0
>> 0 0 0 1
>> [196] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0 0 0 0
>>
>>     [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
-- 
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/



More information about the R-help mailing list