[R] Problem with data distribution

Ebert,Timothy Aaron tebert @end|ng |rom u||@edu
Thu Feb 17 20:26:28 CET 2022


data <- data %>% filter(bug==0) is one option, but you need to save the output somewhere.

Can you tell us more about the expected distribution of bug==0? More than a count of zero bugs.... number of zeros between non-zeros, or something else? 

You could provide the data but rename variables and treatments. Alternatively you could make fake data. It doesn't have to have the same distribution as the real data.

If bugs is the only variable you have then I could recover the data from what you printed (though it will take some effort to remove [#]). For our purposes this would also work:
sample(0:5,50,replace=TRUE)   #draws 50 values with replacement from 0 through 5 inclusive
sample(c(0,0,0,0,1,1,1,2,3,4,5),50,replace=TRUE) #draws 50 samples with replacement from a list of values

abs( round( rnorm(50, mean=2, sd=2),0)) #generates a random number, rounds it to integer, and takes the absolute value.

To make it fully reproducible in this approach one needs to set the random seed.



-----Original Message-----
From: R-help <r-help-bounces using r-project.org> On Behalf Of Neha gupta
Sent: Thursday, February 17, 2022 1:55 PM
To: r-help mailing list <r-help using r-project.org>
Subject: [R] Problem with data distribution

[External Email]

Hello everyone

I have a dataset with output variable "bug" having the following values (at the bottom of this email). My advisor asked me to provide data distribution of bugs with 0 values and bugs with more than 0 values.

data = readARFF("synapse.arff")
data2 = readARFF("synapse.arff")
data$bug
library(tidyverse)
data %>%
  filter(bug == 0)
data2 %>%
  filter(bug >= 1)
boxplot(data2$bug, data$bug, range=0)

But both the graphs are exactly the same, how is it possible? Where I am doing wrong?


data$bug
  [1] 0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 1 0 0 0 0 0
0 4 1 0
 [40] 0 1 0 0 0 0 0 0 1 0 3 2 0 0 0 0 3 0 0 0 0 2 0 0 0 1 0 0 0 0 1 1 1 0 0
0 0 0 0
 [79] 1 1 2 1 0 1 0 0 0 2 2 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 5 0 0 0 0 0 0
7 0 0 1
[118] 0 1 1 0 2 0 3 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 3 2 1 1 0 0 0 0 0 0
0 1 0 0
[157] 0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 0 0 1 0 1 3 0 0 0 0 0 0 0 0 1 0 4 1 1 0
0 0 0 1
[196] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0 0 0 0

        [[alternative HTML version deleted]]

______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=TZx8pDTF9x1Tu4QZW3x_99uu9RowVjAna39KcjCXSElI1AOk1C_6L2pR8YIVfiod&s=NxfkBJHBnd8naYPQTd9Z8dZ2m-RCwh_lpGvHVQ8MwYQ&e=
PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=TZx8pDTF9x1Tu4QZW3x_99uu9RowVjAna39KcjCXSElI1AOk1C_6L2pR8YIVfiod&s=exznSElUW1tc6ajt0C8uw5cR8ZqwHRD6tUPAarFYdYo&e=
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list