[R] Probably a good use for apply

Fri Jun 1 01:25:05 CEST 2012

Yes you are correct.  I want need to change my sample number specification to the number of elements in the vector.  

So sampleWorker function should be:

sampleWorker <- function(x) return(sample(c(TRUE,FALSE),length(x), replace = TRUE, prob = c(x, 1-x)))

So this is where I get a little confused with using apply functions.  Isnt x each element of each vector.  So in the sample data I provide there are 4 x's, and each would be put into the sampleWorker function using the lapply.
#sample data 
test_<- list(a=c(.85,.10),b=c(.99,.05))

To show what I want without using a list of vectors and instead just a single one see below:

IsWorker.Hh_ <- lapply(c(.9,.1) , sampleWorker)
#Returns:
[[1]]
[1] TRUE

[[2]]
[1] FALSE

Now I just need to run through each vector of the list I specify, in this case test_.  Then I need to sum the TRUES for each vector.  So again if we assume the test_ data would result in a single TRUE for each vector (because of the .85 and .99 probabilities) the result would be 

> IsWorker_
 $a
 [1] 1
 $b
 [1] 1

Perhaps lapply isnt the right tool?  I have seen a couple of comments on the list that say the plyr package is easy to figure out but you lose out on speed and that is my issue right now.  I can do what I need to do using some for loops but its way way too slow.  Any guidance is appreciated.  Thanks guys

Josh

-----Original Message-----
From: Sarah Goslee [mailto:sarah.goslee at gmail.com] 
Sent: Thursday, May 31, 2012 1:35 PM
To: ROLL Josh F
Cc: r-help at r-project.org
Subject: Re: [R] Probably a good use for apply

Hi,

On Thu, May 31, 2012 at 1:08 PM, LCOG1 <jroll at lcog.org> wrote:
> This is great thank you.  I think I am getting the hang of some of the 
> apply functions.  I am stuck again however.  I have list test_ below 
> and would like to apply the sample function using each element of each 
> vector as the probability and return a TRUE or FALSE that I will 
> ultimately sum the TRUES by vector.
>
> test_<- list(a=c(.85,.10),b=c(.99,.05)) #Write a function to sample 
> based on labor force participation rates to determine presence of 
> workers in household sampleWorker <- function(x) 
> return(sample(c(TRUE,FALSE),x, replace = TRUE, prob = c(x, 1-x)))

Your first problem is that sampleWorker() doesn't run with a single component of test_ so it can't possibly run in an apply statement.

Please reread ?sample - the second argument is the size of the desired sample, but what you are passing is a non-integer vector of length 2.
What do you actually want this to be?

Then for prob, you're passing c(x, 1-x)) but x is again a non-integer vector of length 2, so that results in a vector of length 4, which is longer than the number of options sample() is choosing from.

Do you perhaps want to pass only a single probability at a time? But even then you need to resolve the size problem.

Sarah

> IsWorker.Hh_ <- lapply(test , sampleWorker)
>
> I am doing something wrong with the setup becuase i am getting an 
> error about specifying probabilities incorrectly.
>
> The result I am looking for for  IsWorker_ to be (assuming the .85, 
> and . 99 probabilities 'win' from each vector and the lower values do not.
>
>> IsWorker_
> $a
> [1]TRUE
> $b
> [1]TRUE
>
> but ultimately I will need to sum the TRUEs for each vector
>
>> IsWorker_
> $a
> [1] 1
> $b
> [1] 1
>
>
> Thanks
>
> Josh
>

--
Sarah Goslee
http://www.functionaldiversity.org