[R] How to use Subpopulation data?

Peter Ehlers ehlers at ucalgary.ca
Fri Oct 2 22:16:34 CEST 2009


Kabeli,

You seem to be doing your best to avoid working
your way through some introductory documentation
like An Introduction to R, which is sitting
right there on your computer.

So let's try to take it slowly, one step at a time.

#1. generate a vector of values to work with.
(forget the matrix() bit and the data.frame() bit
for now and forget about the sampling package
for now)

  x <- rnorm(100, mean=50000, sd=5000)

#2. in order to select a random sample of values
from x, we'll randomly select 20 numbers from
the set {1, 2, 3, ..., 99, 100}. Those 20 numbers
will tell us which x-values get into our sample.

  idx <- sample(100, 20, replace = FALSE)

Here's what I got (you'll get different numbers):

 > idx
  [1] 91 51 97 50 67 31 29 75 89 78 63 77
[13] 38 85 16 74 53 40 54  2

#3. now select x[91], x[51], etc as the elements
of x to be your sample:

  xsample <- x[idx]

#4. now do whatever you want with xsample;

  sum(xsample)
  mean(xsample)
  sd(xsample)

#5. in order to use something like the function
strata() in pkg:sampling, you will first have
to understand how to read its help page and
understand what's in the object returned by strata().

I do hope that this helps at least somewhat,
Peter Ehlers


KABELI MEFANE wrote:
> Dear Mr Winsemius
>  
> I am sorry to have offended any of you by the mistakes i made. The package i loaded is sampling and there was an unwanted comma between size c(20, )and the bracket. What i wanted was to calculate the sum of H in a sample not in the original dataframe. If i do 
> sum(H) i get the sum value of H's in the original dataframe.
>  
>  
> Load package sampling
>  
> H <- matrix(rnorm(100, mean=50000, sd=5000))
> sampleframe=data.frame(type=c(rep("H",100)),value=c(H))
> sampleframe
>  sum(H)
> 
> str=strata(sampleframe,c("type"),size=c(20), method="srswor")
> sample.strat<-getdata(sampleframe,str)
> sample.strat
>  
> Thanks for the input. Once again sorry for wasting your time.
>  
> Best Regards 
>  
>  
> 
> 
> --- On Fri, 2/10/09, David Winsemius <dwinsemius at comcast.net> wrote:
> 
> 
> From: David Winsemius <dwinsemius at comcast.net>
> Subject: Re: [R] How to use Subpopulation data?
> To: "KABELI MEFANE" <kabelimefane at yahoo.co.uk>
> Cc: R-help at r-project.org
> Date: Friday, 2 October, 2009, 3:38 PM
> 
> 
> 
> On Oct 1, 2009, at 6:06 AM, KABELI MEFANE wrote:
> 
>> Dear Helpers
>>
>> I have a sample frame and i have sampled from it using three methods and now i want to calculate the statistics but i only get the population parameters.
>>
>> H <- matrix(rnorm(100, mean=50000, sd=5000))
>> sampleframe=data.frame(type=c(rep("H",100)),value=c(H))
>> sampleframe
>>
>> str=strata(sampleframe,c("type"),size=c(20,), method="srswor")
>> sample.strat<-getdata(sampleframe,str)
>> sample.strat
> 
> If you want the number of rows in sample.strat then length(H) is the wrong approach since that is the original (unsampled) object.
> 
>> length(H)
>> i get:
>>
>> length(H)
>> [1] 100
>>
>> Desire to get:
>> length(H)
>> [1] 20
> 
> I cannot tell what packages you have loaded and strata is not in the sampling package which I guessed (wrongly) was where you were getting "getdata". When you post code you should precede that code with calls that load any non-base packages.
> 
> In later posting you ask for ways to calculate "the sum" but you do not say what it is that you want the sum of.... . Our abilities to read minds is extremely limited.
> 
> --David Winsemius
> 
> 
> 
>       
> 	[[alternative HTML version deleted]]
> 
> 
> 
> ------------------------------------------------------------------------
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list