[R] generating multiple dataset and applying function and output multiple output dataset......

Sarah Goslee sarah.goslee at gmail.com
Mon Sep 5 15:40:51 CEST 2011


Hi,

On Sun, Sep 4, 2011 at 9:25 AM, John Clark <rosbreed.pba at gmail.com> wrote:
> Dear R experts:
>
> Here is my problem, just hard for me...
>
> I want to generate multiple datasets, then apply a function to these
> datasets and output corresponding output in single or multiple dataset
> (whatever possible)...
>
> # my example, although I need to generate a large number of variables and
> datasets
>
> seed <- round(runif(10)*1000000)
>
> datagen <- function(x){
> set.seed(x)
> var <- rep(1:3, c(rep(3, 3)))
> yvar <- rnorm(length(var), 50, 10)
> matrix <- matrix(sample(1:10, c(10*length(var)), replace = TRUE), ncol = 10)
> mydata <- data.frame(var, yvar, matrix)
> }
>
> gdt <- lapply (seed,  datagen)
>
> # resulting list (I believe is correct term) has 10 dataframes: gdt[1]
> .......to gdt[10]

Yes, that's a list of dataframes, though the correct reference is gdt[[1]]

> # my function, this will perform anova in every component data frames and
> output probability coefficients...
> anovp <- function(x){
>          ind <- 3:ncol(x)
>          out <- lm(gdt[x]$yvar ~ gdt[x][, ind[ind]])
>          pval <- out$coefficients[,4][2]
>          pval <- do.call(rbind,pval)
>         }
>
> plist <- lapply (gdt,  anovp)
>
> Error in gdt[x] : invalid subscript type 'list'

It's not a matter of your use of lapply(), which is fine. It's that your
anovp() function just plain doesn't work.

You need to debug it with ONE dataframe before you try to lapply
it to a whole bunch.

> anovp(gdt[[1]])
Error in gdt[x] : invalid subscript type 'list'

This suggests to me that x should be a matrix rather than a list (a dataframe
is a type of list), so I tried:

> anovp(as.matrix(gdt[[1]]))
Error in gdt[x][, ind[ind]] : incorrect number of dimensions

But as you see there are still problems. You'll need to solve those first: if
anovp() doesn't work for one dataframe, it won't work on a list of them.

> This is not working, I tried different options. But could not figure
> out...finally decided to bother experts, sorry for that...
>
> My questions are:
>
> (1) Is this possible to handle such situation in this way or there are other
> alternatives to handle such multiple datasets created?
>
> (2)  If this is right way, how can I do it?
>
>
> Thank you for attention and I will appreciate your help...
>
>
> JC
>


-- 
Sarah Goslee
http://www.functionaldiversity.org



More information about the R-help mailing list