[BioC] How to extract expressed genes from Affymetrix data

michelle_low michelle_low at zoho.com
Mon Jul 30 08:42:30 CEST 2012


Hi Belinda,

Thanks for the reply. After reading the manual, I'm still not quite sure how to use the functions.

If I start from the beginning and use only one cell (wild type) with 3 replicates. I wanna get only the expressed genes in this cell. 

I am not sure if the following script is correct to find these genes (I used P/M/A calling and extract only the probes with 'present' ).

I also have issue with the annotation. I can't seem to make it work. After writing them into text file or excel file, they were in mess.


I appreciate very much for any help..


Thanks,
Michelle





R version 2.13.2 (2011-09-30)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: i386-apple-darwin9.8.0/i386 (32-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

[R.app GUI 1.42 (5910) i386-apple-darwin9.8.0]

[Workspace restored from /Users/mlow/.RData]
[History restored from /Users/mlow/.Rapp.history]

> library(affy)
Loading required package: Biobase

Welcome to Bioconductor

  Vignettes contain introductory material. To view, type
  'browseVignettes()'. To cite Bioconductor, see
  'citation("Biobase")' and for packages 'citation("pkgname")'.

> arrays=ReadAffy()
> arrayRMA=rma(arrays)
Background correcting
Normalizing
Calculating Expression
> arrayRMAtable=exprs(arrayRMA)
> write.exprs(arrayRMA, "RMAvalue.txt")
> arrayPMA=mas5calls(arrays)
Getting probe level data...
Computing p-values
Making P/M/A Calls
write.exprs(arrayRMA, "RMAvalue
> write.exprs(arrayPMA, "PMAvalue.txt")
> arrayPMA=exprs(arrayPMA)
> present=rowSums(arrayPMA=='P')
> present=which(present==ncol(arrayPMA))
> rmafilter=arrayRMA[present,]
> write.exprs(rmafilter,"result.txt")
> dim(rmafilter)
Features  Samples 
   16485        3 

> library(mouse4302.db)
Loading required package: AnnotationDbi
Loading required package: org.Mm.eg.db
Loading required package: DBI

> all=mget(names(rmafilter),mouse4302GENENAME)
> write.table(all, "all.txt")




---- On Sun, 22 Jul 2012 23:48:06 -0700 Belinda Phipson  wrote ---- 

>Hi Michelle 
> 
>The function propexpr() in the limma package will estimate the proportion of 
>expressed probes in each array, as long as there are some negative controls 
>present in your data. 
> 
>You can also estimate the number of probes that are not differentially 
>expressed between WT and knock-out using the convest() function in limma. It 
>takes a vector of p-values. The proportion of differentially expressed genes 
>will be 1-convest(p-vals). 
> 
>Cheers, 
>Belinda 
> 
>-----Original Message----- 
>From: bioconductor-bounces at r-project.org 
>[mailto:bioconductor-bounces at r-project.org] On Behalf Of michelle_low 
>Sent: Monday, 23 July 2012 3:56 PM 
>To: bioconductor at r-project.org 
>Subject: [BioC] How to extract expressed genes from Affymetrix data 
> 
>Hi all, 
> 
>I have done the differential expression analysis below but I wanna know the 
>total number of expressed genes/transcripts in each cell (wild type and the 
>mir223 knockout cell). Can somebody help? Thanks in advance.. 
> 
> 
>Regards, 
>Michelle 
> 
> 
> 
> R version 2.13.2 (2011-09-30) 
>Copyright (C) 2011 The R Foundation for Statistical Computing 
>ISBN 3-900051-07-0 
>Platform: i386-apple-darwin9.8.0/i386 (32-bit) 
> 
>R is free software and comes with ABSOLUTELY NO WARRANTY. 
>You are welcome to redistribute it under certain conditions. 
>Type 'license()' or 'licence()' for distribution details. 
> 
> Natural language support but running in an English locale 
> 
>R is a collaborative project with many contributors. 
>Type 'contributors()' for more information and 
>'citation()' on how to cite R or R packages in publications. 
> 
>Type 'demo()' for some demos, 'help()' for on-line help, or 
>'help.start()' for an HTML browser interface to help. 
>Type 'q()' to quit R. 
> 
>[R.app GUI 1.42 (5910) i386-apple-darwin9.8.0] 
> 
>[Workspace restored from /Users/mlow/.RData] 
>[History restored from /Users/mlow/.Rapp.history] 
> 
>> library(affy) 
>Loading required package: Biobase 
> 
>Welcome to Bioconductor 
> 
> Vignettes contain introductory material. To view, type 
> 'browseVignettes()'. To cite Bioconductor, see 
> 'citation("Biobase")' and for packages 'citation("pkgname")'. 
> 
>> library(limma) 
>> 
>pd=read.AnnotatedDataFrame("phenodata.txt",header=TRUE,sep="",row.names=1) 
>> a=ReadAffy(filenames=rownames(pData(pd)),phenoData=pd,verbose=TRUE) 
>1 reading wildrep1.CEL ...instantiating an AffyBatch (intensity a 1004004x6 
>matrix)...done. 
>Reading in : wildrep1.CEL 
>Reading in : wildrep2.CEL 
>Reading in : wildrep3.CEL 
>Reading in : miR223rep1.CEL 
>Reading in : miR223rep2.CEL 
>Reading in : miR223rep3.CEL 
>> x=rma(a) 
>Background correcting 
>Normalizing 
>Calculating Expression 
>> c=paste(pd$treatment,pd$n,sep="") 
>> f=factor(c) 
>> design=model.matrix(~0+f) 
>> colnames(design)=levels(f) 
>> fit=lmFit(x,design) 
>> library(mouse4302.db) 
>Loading required package: AnnotationDbi 
>Loading required package: org.Mm.eg.db 
>Loading required package: DBI 
> 
> 
>> fit$genes$Symbol <- getSYMBOL(fit$genes$ID,"mouse4302.db") 
>Error: could not find function "getSYMBOL" 
>> library(annotate) 
>> fit$genes$Symbol <- getSYMBOL(fit$genes$ID,"mouse4302.db") 
>> contrast.matrix=makeContrasts(E="present-absent") 
>Error in .Internal(inherits(x, what, which)) : 'x' is missing 
>> contrast.matrix=makeContrasts(E="present-absent",levels=design) 
>> fit2=contrasts.fit(fit,contrast.matrix) 
>> fit2=eBayes(fit2) 
>> results1 <-topTable (fit2, coef=1, p.value=0.3,number=nrow(fit2)) 
>> dim(results1) 
>[1] 889 8 
>> results1 <-topTable (fit2, coef=1, p.value=0.5,number=nrow(fit2)) 
>> dim(results1) 
>[1] 2997 8 
>> results1 <-topTable (fit2, coef=2, p.value=0.5,number=nrow(fit2)) 
>Error in toptable(fit = fit[c("coefficients", "stdev.unscaled")], coef = 
>coef, : 
> subscript out of bounds 
>> write.table(results1, file="WT-KO.txt") 
>> results1 <-topTable (fit2, coef=2, p.value=0.4,number=nrow(fit2)) 
>Error in toptable(fit = fit[c("coefficients", "stdev.unscaled")], coef = 
>coef, : 
> subscript out of bounds 
>> results1 <-topTable (fit2, coef=1, p.value=0.4,number=nrow(fit2)) 
>> dim(results1) 
>[1] 1668 8 
>> write.table(results1, file="WT-KO.txt") 
> 
>_______________________________________________ 
>Bioconductor mailing list 
>Bioconductor at r-project.org 
>https://stat.ethz.ch/mailman/listinfo/bioconductor 
>Search the archives: 
>http://news.gmane.org/gmane.science.biology.informatics.conductor 
> 
> 
>______________________________________________________________________ 
>The information in this email is confidential and inten...{{dropped:5}}



More information about the Bioconductor mailing list