[R] generate distribution based on summary data and add random noise

PIKAL Petr petr@p|k@| @end|ng |rom prechez@@cz
Thu Feb 3 17:44:51 CET 2022


Hallo Bert

probably not, sorry. Did you try my examples?

To make it maybe simpler
1. sample a vector with given proportion and generate new data
2. add random noise to each generated value with sd given by value of a vector.

let say

x <- c(10, 100)
y <- c(.6, .4)
set.seed(200)
z <- sample(x, 10, rep=TRUE, prob=y)
ind <- order(z)
bins <- rle(z[ind])
bin1 <- rnorm(bins$lengths[1], mean = 0, sd=bins$values[1]/5)
bin2 <- rnorm(bins$lengths[2], mean = 0, sd=bins$values[2]/5)
z[ind] + c(bin1, bin2)

Sorry that I did not explain myself more clearly, I hoped that example showed what I have on mind.

Basically it is particle size cumulative distribution but size is expressed as size bins. Normally I have exact size measurement for each particle.

S pozdravem | Best Regards
RNDr. Petr PIKAL
Vedoucí Výzkumu a vývoje | Research Manager
PRECHEZA a.s.
nábř. Dr. Edvarda Beneše 1170/24 | 750 02 Přerov | Czech Republic
Tel: +420 581 252 256 | GSM: +420 724 008 364
mailto:petr.pikal using precheza.cz | https://www.precheza.cz/

Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních partnerů PRECHEZA a.s. jsou zveřejněny na: https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about processing and protection of business partner’s personal data are available on website: https://www.precheza.cz/en/personal-data-protection-principles/
Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a podléhají tomuto právně závaznému prohlášení o vyloučení odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/

From: Bert Gunter <bgunter.4567 using gmail.com> 
Sent: Thursday, February 3, 2022 5:10 PM
To: PIKAL Petr <petr.pikal using precheza.cz>
Cc: R-help <r-help using r-project.org>
Subject: Re: [R] generate distribution based on summary data and add random noise

If I understand correctly:
To generate a sample of total size N, generate a uniform sample of size p*N for a bin with proportion p?
?runif


Bert Gunter

"The trouble with having an open mind is that people keep coming along and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Thu, Feb 3, 2022 at 7:52 AM PIKAL Petr <mailto:petr.pikal using precheza.cz> wrote:
Hallo all

I have summary data with size bins and percentage below that size.

dat <- structure(list(size = c(10L, 20L, 30L, 40L, 50L, 60L, 70L, 80L,
90L, 100L, 110L, 120L, 130L, 140L, 150L, 160L, 170L, 180L, 190L,
200L, 250L, 300L, 400L, 500L), percent = c(0L, 0L, 0L, 1L, 1L,
2L, 4L, 8L, 13L, 18L, 24L, 31L, 38L, 44L, 50L, 57L, 65L, 72L,
76L, 83L, 95L, 98L, 100L, 100L)), class = "data.frame", row.names = c(NA,
-24L))

#I want to generate original distribution (I know it is better not to do it but I have no other choice) so I calculated #mids of those bins

xd <-dat$size-c(5,diff(dat$size)/2)
xd<- xd[-1]

#I can sample the size bins with probability given by percent.
Result <- sample(xd, 1000, rep=T, prob=diff(dat$percent)/100)
plot(ecdf(Result))

#and I can add some noise to it, which is satisfactory with lower size bins but not enough for higher size bins.

Result <- sample(xd, 1000, rep=T, prob=diff(dat$percent)/100)+rnorm(1000, mean=0, sd=5)
plot(ecdf(Result))
I can increase sd to satisfy bigger bin size but in that case noise is too big for lower bin size.

I would like to add smaller random noise to lower size bins and bigger random noise to higher size bins, which seems to be easy task but I am stuck how to do it. It should be somehow proportional to size value.
The only way forward I see is to sort generated result and to use something like

+ rnorm(1000, mean=xd, sd=xd/10)
But it is not correct.

I'd appreciate any hint how to add random noise to values in ordered manner.

Best regards.
Petr

Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních partnerů PRECHEZA a.s. jsou zveřejněny na: https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about processing and protection of business partner’s personal data are available on website: https://www.precheza.cz/en/personal-data-protection-principles/
Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/

______________________________________________
mailto:R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


More information about the R-help mailing list