[R] how to combine data of several csv-files

jim holtman jholtman at gmail.com
Tue Jul 31 01:36:30 CEST 2007


Is this what you want:

> x <- read.table(textConnection("  v1         v2          v3          v4         v5          v6           v7 v8
+ 1 NA -0.6442149  0.02354036 -1.40362589 -1.1829260  1.17099178 -0.046778203 NA
+ 2 NA -0.2047012 -1.36186952  0.13045724  2.1411553  0.49248118 -0.233788840 NA
+ 3 NA -1.1986041 -0.42197792 -0.84651458 -0.1327081 -0.18690065  0.443908897 NA
+ 4 NA -0.2097442  1.50445971  1.57005071 -0.1053442  1.50050976 -1.649740180 NA
+ 5 NA -0.7343465 -1.76763996  0.06961015 -0.8179396 -0.65552410  0.003991354 NA
+ 6 NA -1.3888750  0.53722404  0.25269771 -1.2342698 -0.01243247
-0.228020092 NA"), header=TRUE)
>
> categ <- scan(textConnection("NA        cat1        cat1        cat1       cat2        cat2         cat2   NA"), what='')
Read 8 items
> cat.col <- split(1:ncol(x), categ)
> lapply(cat.col, function(.cat){
+     rowMeans(x[, .cat])
+ })
$cat1
                  1                   2                   3
       4                   5
-0.6747668100000001 -0.4787044933333333 -0.8223655333333334
0.9549220733333333 -0.8107921033333333
                  6
-0.1996510833333333

$cat2
                   1                    2                    3
           4                    5
-0.01957080766666663  0.79994921333333324  0.04143338233333334
-0.08485820666666670 -0.48982411533333337
                   6
-0.49157412066666667



On 7/30/07, Antje <niederlein-rstat at yahoo.de> wrote:
> Hello,
>
> thank you for your help. But I guess, it's still not what I want... printing df.my gives me
>
> df.my
>   v1         v2          v3          v4         v5          v6           v7 v8
> 1 NA -0.6442149  0.02354036 -1.40362589 -1.1829260  1.17099178 -0.046778203 NA
> 2 NA -0.2047012 -1.36186952  0.13045724  2.1411553  0.49248118 -0.233788840 NA
> 3 NA -1.1986041 -0.42197792 -0.84651458 -0.1327081 -0.18690065  0.443908897 NA
> 4 NA -0.2097442  1.50445971  1.57005071 -0.1053442  1.50050976 -1.649740180 NA
> 5 NA -0.7343465 -1.76763996  0.06961015 -0.8179396 -0.65552410  0.003991354 NA
> 6 NA -1.3888750  0.53722404  0.25269771 -1.2342698 -0.01243247 -0.228020092 NA
>
> now, I have to combine like this:
>
>   v1         v2          v3          v4         v5          v6           v7     v8
>   NA        cat1        cat1        cat1       cat2        cat2         cat2   NA
>
> -->
>
> mean(df.my$v2[1],df.my$v3[1],df.my$v4[1])
> mean(df.my$v2[2],df.my$v3[2],df.my$v4[2])
> mean(df.my$v2[3],df.my$v3[3],df.my$v4[3])
> mean(df.my$v2[4],df.my$v3[4],df.my$v4[4])
> mean(df.my$v2[5],df.my$v3[5],df.my$v4[5])
> mean(df.my$v2[6],df.my$v3[6],df.my$v4[6])
>
> the same for v5, v6 and v7
>
> further, I'm not sure how to avoid the list, because this is the result of the processing I did before...
>
> Ciao,
> Antje
>
>
> 8rino-Luca Pantani schrieb:
> > I hope I see.
> >
> > Why not try the following, and avoid lists, which I'm not still able to
> > manage properly ;-)
> > v1 <- NA
> > v2 <- rnorm(6)
> > v3 <- rnorm(6)
> > v4 <- rnorm(6)
> > v5 <- rnorm(6)
> > v6 <- rnorm(6)
> > v7 <- rnorm(6)
> > v8 <- rnorm(6)
> > v8 <- NA
> > (df.my <- cbind.data.frame(v1, v2, v3, v4, v5, v6, v7, v8))
> > (df.my2 <- reshape(df.my,
> >                  varying=list(c("v1","v2","v3", "v4","v5","v6","v7","v8")),
> >                  idvar="sequential",
> >                  timevar="cat",
> >                  direction="long"
> >        ))
> > aggregate(df.my2$v1, by=list(category=df.my2$cat), mean)
> > aggregate(df.my2$v1, by=list(category=df.my2$cat), function(x){sd(x,
> > na.rm = TRUE)})
> >
> >
> > Antje ha scritto:
> >> okay, I played a bit around and now I have some kind of testcase for you:
> >>
> >> v1 <- NA
> >> v2 <- rnorm(6)
> >> v3 <- rnorm(6)
> >> v4 <- rnorm(6)
> >> v5 <- rnorm(6)
> >> v6 <- rnorm(6)
> >> v7 <- rnorm(6)
> >> v8 <- rnorm(6)
> >> v8 <- NA
> >>
> >> list <- list(v1,v2,v3,v4,v5,v6,v7,v8)
> >> categ <- c(NA,"cat1","cat1","cat1","cat2","cat2","cat2",NA)
> >>
> >> > list
> >> [[1]]
> >> [1] NA
> >>
> >> [[2]]
> >> [1] -0.6442149 -0.2047012 -1.1986041 -0.2097442 -0.7343465 -1.3888750
> >>
> >> [[3]]
> >> [1]  0.02354036 -1.36186952 -0.42197792  1.50445971 -1.76763996
> >> 0.53722404
> >>
> >> [[4]]
> >> [1] -1.40362589  0.13045724 -0.84651458  1.57005071  0.06961015
> >> 0.25269771
> >>
> >> [[5]]
> >> [1] -1.1829260  2.1411553 -0.1327081 -0.1053442 -0.8179396 -1.2342698
> >>
> >> [[6]]
> >> [1]  1.17099178  0.49248118 -0.18690065  1.50050976 -0.65552410
> >> -0.01243247
> >>
> >> [[7]]
> >> [1] -0.046778203 -0.233788840  0.443908897 -1.649740180  0.003991354
> >> -0.228020092
> >>
> >> [[8]]
> >> [1] NA
> >>
> >> now, I need the means (and sd) of element 1 of list[2],list[3],list[4]
> >> (because they belong to "cat1") and
> >>
> >> = mean(-0.6442149, 0.02354036, -1.40362589)
> >>
> >> the same for element 2 up to element 6 (--> I would the get a vector
> >> containing the means for "cat1")
> >> the same for the vectors belonging to "cat2".
> >>
> >> does anybody now understand what I mean?
> >>
> >> Antje
> >>
> >>
> >>
> >
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?



More information about the R-help mailing list