[R] Loop over factor returns NA

arun smartpink111 at yahoo.com
Sat Oct 12 19:46:16 CEST 2013


Hi,
Not sure if you have any restrictions in using ?lapply().
AB <- read.table(text="
      time        x        y      z        gene      part
1  03:27:58    1      2        3        grom        1
2  03:27:58    2      3        4        grom        1
3  03:27:58    3      4        5        grom        1
4  04:44:23    12      13      14      grom        2
5  04:44:23    13      14      15      grom        2
6  04:44:23    14      15      16      grom        2
7  04:44:23    15      16      17      grom        2
8  06:23:45  101    102    103    vir            3
9  06:23:45  102    103    104    vir            3
10 06:23:45  103    104    105    vir            3",sep="",header=TRUE,stringsAsFactors=FALSE)  

str(AB)
#'data.frame':    10 obs. of  6 variables:
# $ time: chr  "03:27:58" "03:27:58" "03:27:58" "04:44:23" ...
# $ x   : int  1 2 3 12 13 14 15 101 102 103
# $ y   : int  2 3 4 13 14 15 16 102 103 104
# $ z   : int  3 4 5 14 15 16 17 103 104 105
# $ gene: chr  "grom" "grom" "grom" "grom" ...
# $ part: int  1 1 1 2 2 2 2 3 3 3


#It is not clear from the example whether you have multiple 'gene` within 'part' or 'time'.


res1 <- do.call(rbind,lapply(split(AB,AB$part),function(u) {
                                                                                    sdx<- sd(u$x)
                                                                                    sdy<- sd(u$y)
                                                                                    sdz <- sd(u$z)
                                                             tab<- data.frame(sdx,sdy,sdz,gene=u$gene[1],stringsAsFactors=FALSE)
                                                                }))
#Similarly for time

res2 <- do.call(rbind,lapply(split(AB,AB$time),function(u) {
                                                                                    sdx<- sd(u$x)
                                                                                    sdy<- sd(u$y)
                                                                                    sdz <- sd(u$z)
                                                             tab<- data.frame(sdx,sdy,sdz,gene=u$gene[1],stringsAsFactors=FALSE)
                                                                }))
 str(res1)
#'data.frame':    3 obs. of  4 variables:
# $ sdx : num  1 1.29 1
# $ sdy : num  1 1.29 1
# $ sdz : num  1 1.29 1
# $ gene: chr  "grom" "grom" "vir"

#Use
?write.table()

A.K.




On Saturday, October 12, 2013 11:12 AM, anna berg <anna.berg1986 at hotmail.com> wrote:
Dear R users,

I am pretty new to programming in R. So I guess there is some obvious mistake I am making. I hope you can help me.
I have a data frame that looks like this:

>  AB
        time        x        y       z         gene       part
1   03:27:58     1       2        3        grom         1
2   03:27:58     2       3        4        grom         1
3   03:27:58     3       4        5        grom         1
4   04:44:23    12      13      14      grom         2
5   04:44:23    13      14      15      grom         2
6   04:44:23    14      15      16      grom         2
7   04:44:23    15      16      17      grom         2
8   06:23:45   101     102    103    vir             3
9   06:23:45   102     103    104    vir             3
10 06:23:45   103     104    105    vir             3

Now I want to apply a loop (here a simplified version; I know that I could do this easily with tapply, but for the other things that I want to do with the loop (e.g. weighted mean of time series after fast fourier transformation) I would rather like to use a loop). 
Note that "time" and "part" are actually the same, just one is a factor and the the other is a number.
Here is the loop that works fine and returns the result as I want (the important part here is: Intervall <- AB[AB$part==i,]):

for(i in 1:length(unique(AB$time)))  
{
    Intervall <- AB[AB$part==i,]
    attach(Intervall)
    # Standart deviation
    sdx  <-sd(x)
    sdy  <-sd(y)
    sdz  <-sd(z)
    # Add Behavior
     gene <- as.character(Intervall[1,5])
    # Construct a table
      tab <-c(sdx, sdy, sdz, gene)
      write(tab, file=paste("VariableTable.txt", sep=""),
               ncolumns=4,sep=",", append=TRUE)
    detach(Intervall)
}  # end of for loop

The result looks like this and is fine:

1,1,1,grom
1.3,1.3,1.3,grom
1,1,1,vir

My problem is, that I used the "part" column only to run the loop, but I actually want to use the time column to run the loop. But when I replace 

Intervall <- AB[AB$part==i,]
with
Intervall <- AB[AB$time==i,]

then the resulting table only contains NA.

I also tried to use Intervall <- AB[x==i,]

x <- as.factor(AB$part) --> which works fine as well
x <- as.factor(AB$time) --> which returns only NA 
x <- unique(AB$time) ---> which returns only NA
x <- levels(unique(AB$time) --> which returns only NA
x <- seq(unique(AB$time) ---> which returns the standard deviation of the entire column (not the single parts) 

What do I do wrong? And how can i fix it?

Thank you so much in advance.

Kind regards,
Anna

                          
    [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list