[R] help for an R automated procedures

Thu Feb 28 12:09:29 CET 2013

Hi

exactly what is 

fortune("surgery")

about.

Anyway, you can save yourself a lot headache, if you start using lists for your objects.

Lists can be used easily in cycles.

for (i in 1:n) {
some.list[i] <- some.function(some.other.list[i])
}

and also lapply/sapply functions can be useful

sapply(sp1.loc1,scale)

will give you scaled data frame

Regards
Petr

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Gustavo Vieira
> Sent: Thursday, February 28, 2013 10:53 AM
> To: r-help at r-project.org
> Subject: [R] help for an R automated procedures
> 
> 
> Dear, I would like to post the following question to the r-help on
> Nabble (thanks in advance for the attention, Gustavo Vieira):
> Hi there.
> I have a data set on hands with 5,220 cases and I'd like to automate
> some procedures (but I have almost no programming knowledge). The data
> has some continuous variables that are grouped by 2 others: the name of
> species and the locality where they were collected. So, the samples are
> defined as 'each species on each locality'. For every sample I'd like
> to do multiple imputation (when applicable), test for the presence of
> outliers, standardize the variables, correct some species abundances,
> save individual samples to tab delimited text file, and assemble each
> individual sample (now, without NAs and outliers, corrected abundances,
> and with the new standardized
> variables) into a single data set. That task is pretty complex to me,
> since my programming knowledge is poor (and my free time to learn R
> programming is sparse). Could someone help me with that (I could
> provide you the data set and the script I have written to do that,
> sample by sample [ouch!])?
> Thanks in advance for your attention and all the best
> (ghcv at hotmail.com).
> 
> [Bellow is an example is the codes I've used to accomplish my goals,
> sample by sample, which can exemplify the complexity of the procedures:
> 
> #Subsetting the data (v1-v11 are continuous "predictors"): species 1 at
> locality 1 (all data [5520 cases] are on a vector called 'morfo')
> sp1.loc1<-morfo[which(spps=="sp1" & taxoc=="loc1"),] #getting only the
> observations of sp1 (species 1) at loc1 (locality 1)
> str(sp1.loc1) #abundance -> 19 cases and the abundance variable
> ('abund') says 18...
> sp1.loc1$abund<-rep(19,19)
> summary(sp1.loc1) #missing values present; abundance for sp1 at loc1
> corrected
> attach(sp1.loc1)
> 
> #Dealing with NAs:
> install.packages("mice", dependencies = T) #ok (R at: home & work)
> library(mice)
> imp <- mice(sp1.loc1)
> sp1.loc1 <- complete(imp)
> summary(sp1.loc1) #jaust checking... No more Nas!
> attach(sp1.loc1)
> 
> 
> #Detecting univariate outliers
> z.crit <- qnorm(0.9999)
> 
> subset(sp1.loc1, select = id, subset = abs(scale(v1)) > z.crit)
> 
> subset(sp1.loc1, select = id, subset = abs(scale(v2)) > z.crit)
> morfo[47,6]
> sort(v2[taxoc=="loc1"]) #the nearest observation close to 32.00 is
> 25.10 sp1.loc1[,6][sp1.loc1[,6]==32.00]<-25.10
> subset(sp1.loc1, select = id, subset = abs(scale(v2)) > z.crit)
> #Rechecking for outliers (now, it's ok)
> 
> subset(sp1.loc1, select = id, subset = abs(scale(v3)) > z.crit)
> 
> subset(sp1.loc1, select = id, subset = abs(scale(v4)) > z.crit)
> 
> subset(sp1.loc1, select = id, subset = abs(scale(v5)) > z.crit)
> 
> subset(sp1.loc1, select = id, subset = abs(scale(v6)) > z.crit)
> 
> subset(sp1.loc1, select = id, subset = abs(scale(v7)) > z.crit)
> 
> subset(sp1.loc1, select = id, subset = abs(scale(v8)) > z.crit)
> 
> subset(sp1.loc1, select = id, subset = abs(scale(v9)) > z.crit)
> 
> subset(sp1.loc1, select = id, subset = abs(scale(v10)) > z.crit)
> 
> subset(sp1.loc1, select = id, subset = abs(scale(v11)) > z.crit)
> 
> #Standardizing variables
> v1.std<-with(sp1.loc1,(scale(v1)))
> v1.pad<-v1.std[,1]
> 
> v2.std<-with(sp1.loc1,(scale(v2)))
> v2.pad<-v2.std[,1]
> 
> v3.std<-with(sp1.loc1,(scale(v3)))
> v3.pad<-v3.std[,1]
> 
> v4.std<-with(sp1.loc1,(scale(v4)))
> v4.pad<-v4.std[,1]
> 
> v5.std<-with(sp1.loc1,(scale(v5)))
> v5.pad<-v5.std[,1]
> 
> v6.std<-with(sp1.loc1,(scale(v6)))
> v6.pad<-v6.std[,1]
> 
> v7.std<-with(sp1.loc1,(scale(v7)))
> v7.pad<-v7.std[,1]
> 
> v8.std<-with(sp1.loc1,(scale(v8)))
> v8.pad<-v8.std[,1]
> 
> v9.std<-with(sp1.loc1,(scale(v9)))
> v9.pad<-v9.std[,1]
> 
> v10.std<-with(sp1.loc1,(scale(v10)))
> v10.pad<-v10.std[,1]
> 
> v11.std<-with(sp1.loc1,(scale(v11)))
> v11.pad<-v1.std[,1]
> 
> 
> #Joining the new standardized variables to the sp1.loc1 data set
> 
> sp1.loc1<-
> data.frame(sp1.loc1,v1.pad,v2.pad,v3.pad,v4.pad,v5.pad,v6.pad,v7.pad,v8
> .pad,v9.pad,v10.pad,v11.pad)
> 
> attach(sp1.loc1)
> 
> write.table(sp1.loc1,"sp1.at.loc1.txt",quote=F,row.names=F,
> col.names=T,sep="\t")
> 
> detach(sp1.loc1)
> 
> #Subsetting the data (v1-v11 are continuous "predictors"): species 2 at
> locality 1...]--
> 
> "Time will tell"
> --
> 
> 
> 	[[alternative HTML version deleted]]