[R] Odp: how to generate data set with different length and calculate the mean?

Petr PIKAL petr.pikal at precheza.cz
Tue Feb 2 08:56:23 CET 2010


Hi

r-help-bounces at r-project.org napsal dne 01.02.2010 16:15:20:

> 
> Petr, 
> Thanks for your suggestions. It makes sense, since I don't know how to 
make
> a matrix with different length of rows. 

You can't. Matrix is a vector with dimensions so it can be only 
rectangular and it can consist of objects of same type (see ?matrix)

> I have a concern for this problem. I actually deal with a much bigger
> dataset e.g. 1000, and each dataset needs to change the number of data 
in it
> according a vector which has 1000 corresponding  different values. It 
will
> be hard to deal with data one by one. Is there a way I can do them 
together?
> Sorry for not making it clear. 

Well you did not make it clear at all. Either provide some reproducible 
code which results in something and suggest desired output for the task or 
at least some small datasets and desired output.

>From what you say you maybe want something like

vec <- rnorm(5)
chunks<- c(12, 5, 3)
vec.new <- c(vec,  unlist(sapply(chunks, rnorm)))
 
but maybe not. I am really not wery godd at mind reading.


> 
> I am thinking I have to use 'for loop' to get a list of vectors. But I 
am
> not sure how to do it efficiently? Thanks again.
> 

If you do not use nested for loops for task which can be done by single 
built in function use loops are not so inefficient.

Regards
Petr



> 
> 
> 
> 
> Petr Pikal wrote:
> > 
> > Hi
> > 
> > I have no idea how you could do what you want. I only recommend you to 
use 
> > list instead of matrix as list can incorporate objects with various 
size
> > 
> > I am not sure if this is the most elegant way but you can make your 
matrix 
> > a data frame
> > 
> > ddd<- as.data.frame(data)
> > and than use thist
> > 
> > lapply(ddd, function(x) unlist(list(x)))
> > 
> > To get list of vectors
> > 
> > Regards
> > Petr
> > 
> > r-help-bounces at r-project.org napsal dne 01.02.2010 03:46:34:
> > 
> >> 
> >> Hello,
> >> 
> >> This may be a rare question. I am struggling to solve it. I really
> >> appreciate any help or suggestions. Thanks a lot in advance!
> >> 
> >> 
> >> I put my questions between the code to make it clear. The problem I 
have 
> > is:
> >> I generated 10 data sets with 8 data for each set. Now I want to 
change 
> > the
> >> number of data in each dataset according to a vector 'size' (as 
> > follows),
> >> that is, each new dataset contains different number of data. How can 
I 
> > do
> >> it? After generating the new datasets, how can I seperate the data 
from 
> > two
> >> distributions and calculate the sample mean? Thanks a lot. 
> >> 
> >> 
> >> 
> >> # generate 10 data sets, each data sets include 8 sample. 4 from N(0, 
1) 
> > and
> >> 4 from N(5, 1)
> >> data<- matrix(0,10,8)
> >>  th    <- c(0, 5, 1)
> >> for(i in 1:10){
> >>  data[i,] <- rnorm(8,mean= rep(th[1:2],8/2),sd=th[3])
> >> }
> >> 
> >> # change the number of samples for each data set.  e.g. the first 
> > dataset
> >> needs to increase to 20, the #first 8 keep the same, add another 12 
> > sample
> >> (6 from N(0,1) and the other 6 from N(5, 1) ), the second #dataset 
needs 
> > to
> >> increase to 10, keep the first 8 the same, generate another 2 (one 
from
> >> N(0,1) and the #other one from N(5,1)),  the third data set does not 
> > need to
> >> change. etc. 
> >> 
> >> size=c(20, 10, 8, 14, 16, 12, 8, 80)
> >> 
> >> 
> >> # Since each data set changes to different size, and add different 
> > number of
> >> data,  for each dataset how #can I calculate the difference of the 
> > sample
> >> mean from N(0,1) and the sample mean from 
> >> #N(5,1) and the pooled standard deviation of two samples. Two 
> > difficulties:
> >> each new dataset includes #different number of data; another 
difficulty,
> >> when I generated data, the two successive data are 
> >> #from different normal distribution, how can I seperate them and 
> > calculate
> >> the average for each sample #and pooled standard deviation?
> >> 
> >> 
> >> 
> >> -- 
> >> View this message in context: 
> > http://n4.nabble.com/how-to-generate-data-set-
> >> with-different-length-and-calculate-the-mean-tp1458420p1458420.html
> >> Sent from the R help mailing list archive at Nabble.com.
> >> 
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide 
> > http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> > 
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> > 
> > 
> 
> -- 
> View this message in context: 
http://n4.nabble.com/within-a-matrix-how-to-add-
> each-column-with-different-length-of-data-tp1458420p1458870.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list