[R] Need some suggestions for outlier detection in a matrix

arun smartpink111 at yahoo.com
Wed Jan 15 17:33:27 CET 2014


Hi,
Try:
dat1 <- read.table("ZvsPGRT_frag_0filt.txt",sep="\t",header=TRUE,row.names=1)
dat_Z <- dat1[,1:4] ## unnecessary to do cbind() here
mat1 <- as.matrix(dat_Z)
 head(mat1,2)
#            Sample_118z.0 Sample_132z.0 Sample_141z.0 Sample_183z.0
#XLOC_000001           626          3516          1277           770
#XLOC_000002            82           342           185            72
library(outliers)
 ctest_mat1 <- t(apply(mat1,1,function(x) {test <- chisq.out.test(as.numeric(x)); c(outLier=as.numeric(gsub("[[:alpha:]]","",test$alternative)), Pval=test$p.value)}))
 mat2 <- cbind(mat1,ctest_mat1)
head(mat2,2)
#            Sample_118z.0 Sample_132z.0 Sample_141z.0 Sample_183z.0 outLier
#XLOC_000001           626          3516          1277           770    3516
#XLOC_000002            82           342           185            72     342
#                 Pval
#XLOC_000001 0.1423296
#XLOC_000002 0.1707215


A.K.




On Wednesday, January 15, 2014 7:12 AM, Vivek Das <vd4mmind at gmail.com> wrote:

HI Arun,

I was wondering how to use the package outliers. There is a package which can help me identifying outliers for each row. So I have a matrix with rownames for first column and next 4 colmns have values. for each row I want to find the outlier and also the test statistic of it. So there is a package ‘outliers’. Which has this test chisq.out.test that  performs a chisquared test for detection of one outlier in a vector. So now I want to apply this for my matrix. and want to find out for each row which is the outlier and then what is the p.value associated to it. I was using the below code 


data<-read.table("my_file.txt",,sep='\t', header=T)
## Selecting only the centers
data_Z<-cbind(data[,1:5])
mat1<- as.matrix(data_Z[,2:5])
row.names(mat1)<- data_Z[,1]
head(mat1)

            Sample_118z.0 Sample_132z.0 Sample_141z.0 Sample_183z.0
XLOC_000001           626          3516          1277           770
XLOC_000002            82           342           185            72
XLOC_000003           361          2000           867           438
XLOC_000004            30           143            67            37
XLOC_000010             1             7             5             3
XLOC_000011            10            63            19            15

ctest_mat1<-c()

for (i in 1:length(mat1[,1]))
{
ctest_mat1<-c(ctest_mat1,chisq.out.test(as.numeric(mat1[i,])))

}

But this does not give me the outlier for each row. I mean it should be ideally but when am trying to combine it with the matrix mat1 with below command I get the error

res <-cbind(mat1,ctest_mat1)
Warning message:
In .Method(..., deparse.level = deparse.level) :
  number of rows of result is not a multiple of vector length (arg 2)

I want my matrix  with the mat1 and also the columns for each row saying which is the outlier and the p- value associated with it.  I mean when I 

head(ctest_mat1)
$statistic
X-squared 
 2.152591 

$alternative
[1] "highest value 3516 is an outlier"

$p.value
[1] 0.1423296

$method
[1] "chi-squared test for outlier"

$data.name
[1] "as.numeric(mat1[i, ])"

$statistic
X-squared 
 1.876596 

I get only the following for the first row. I want it was a matrix for all the rows and combine it with my mat1 so that I can then evaluate. Can you help me with that? I am also attaching the matrix. I hope you understood my point.



----------------------------------------------------------

Vivek Das



More information about the R-help mailing list