[BioC] pd.hugene.1.0.st.v1

Mark Robinson mrobinson at wehi.EDU.AU
Sat Aug 1 00:15:21 CEST 2009


Hi Vince.

Thanks for the reply.

That's good to know.  But, it only allows me to access the indices,  
not to actually compute gene-level summaries, right?  Any way to do  
that without building the package from scratch?

Cheers,
Mark

On 31/07/2009, at 10:10 PM, Vincent Carey wrote:

> On Fri, Jul 31, 2009 at 12:48 AM, Mark  
> Robinson<mrobinson at wehi.edu.au> wrote:
>> Hi all.
>>
>> I wonder if its makes more sense to have the *transcript* version  
>> of this
>> package, instead of the *probeset* version available when you  
>> install via:
>>
>
> This merits further discussion.  Note that under the current approach
> you can obtain
> the transcript cluster indices for summarization using fData on the
> output of rma
>
>> class(tismix)
> [1] "GeneFeatureSet"
> attr(,"package")
> [1] "oligoClasses"
>> class(tismixRMA)
> [1] "ExpressionSet"
> attr(,"package")
> [1] "Biobase"
>> fData(tismixRMA)[1:4,]
>         fsetid  exon_id transcript_cluster_id level crosshyb_type  
> chrom
> 7896737 7896737 96595542               7896736    NA              
> 3     1
> 7896739 7896739 96595544               7896738    NA              
> 3     1
> 7896741 7896741 96595546               7896740    NA              
> 3     1
> 7896743 7896743 96595548               7896742    NA              
> 3     1
>
>                      accessions
> 7896737
>                            <NA>
> 7896739
>                            <NA>
> 7896741  
> BC136848 
> ,BC136907,ENST00000318050,ENST00000326183,ENST00000335137,NM_001
> 004195,NM_001005240,NM_001005484
> 7896743
>        BC118988,ENST00000279067
>
>> dim(fData(tismixRMA))
> [1] 253002      7
>> dim(exprs(tismixRMA))
> [1] 253002     33
>
> annotation packages are available at both the probescript and
> transcript cluster level, thanks
> to folks at city of hope (e.g.,
> http://www.bioconductor.org/packages/release/data/annotation/html/hugene10sttranscriptcluster.db.html)
>
>
>> source("http://bioconductor.org/biocLite.R")
>> biocLite("pd.hugene.1.0.st.v1")
>>
>> It seems like as a default, more people would want gene-level  
>> summaries for
>> these arrays ... especially since ~200k (~80%) of the probesets  
>> have 3
>> probes or less.
>>
>> Of course I (and everyone around the world) could build this  
>> package locally
>> from scratch using the transcript CSV, but it seems like there  
>> would be
>> enough demand for this to make available direct from BioC.  Just a  
>> thought.
>>  Does anyone agree?
>>
>> Or, am I missing something that will allow me to do gene-level  
>> analysis from
>> this package?
>>
>> My session is below.
>>
>> Thanks in advance.
>> Mark
>>
>>
>>
>> ----------------------
>> mac1618:Desktop mrobinson$ wc -l HuGene-1_0-st-v1.na29.*.csv
>>  257449 HuGene-1_0-st-v1.na29.hg18.probeset.csv
>>   33317 HuGene-1_0-st-v1.na29.hg18.transcript.csv
>> ----------------------
>>
>>
>> ----------------------
>>> library(oligo)
>> Loading required package: oligoClasses
>> Loading required package: Biobase
>>
>> Welcome to Bioconductor
>>
>>  Vignettes contain introductory material. To view, type
>>  'openVignette()'. To cite Bioconductor, see
>>  'citation("Biobase")' and for packages 'citation(pkgname)'.
>>
>> Loading required package: preprocessCore
>> Welcome to oligo version 1.8.1
>>> cf <- dir(celPath,"CEL")
>>> fs <- read.celfiles( file.path(celPath,cf) )
>> Loading required package: pd.hugene.1.0.st.v1
>> Loading required package: RSQLite
>> Loading required package: DBI
>> Platform design info loaded.
>> Reading in : rawData/cell_line/HuGene-1_0-st-v1//cancer1.CEL
>> Reading in : rawData/cell_line/HuGene-1_0-st-v1//cancer2.CEL
>> Reading in : rawData/cell_line/HuGene-1_0-st-v1//normal1.CEL
>> Reading in : rawData/cell_line/HuGene-1_0-st-v1//normal2.CEL
>>> rmaOligo <- oligo::rma(fs)
>> Background correcting
>> Normalizing
>> Calculating Expression
>> dmOligo <- exprs(rmaOligo)
>> dim(rmaOligo)
>>> dmOligo <- exprs(rmaOligo)
>>> dim(rmaOligo)
>> Features  Samples
>>  253002        4
>>> sessionInfo()
>> R version 2.9.0 (2009-04-17)
>> i386-apple-darwin8.11.1
>>
>> locale:
>> en_AU.UTF-8/en_AU.UTF-8/C/C/en_AU.UTF-8/en_AU.UTF-8
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] pd.hugene.1.0.st.v1_2.4.1 RSQLite_0.7-1
>> [3] DBI_0.2-4                 oligo_1.8.1
>> [5] preprocessCore_1.6.0      oligoClasses_1.6.0
>> [7] Biobase_2.4.1
>>
>> loaded via a namespace (and not attached):
>> [1] affxparser_1.15.6 affyio_1.12.0     Biostrings_2.12.1  
>> IRanges_1.2.2
>> [5] splines_2.9.0
>> ----------------------
>>
>>
>>
>>
>>
>>
>>
>> ------------------------------
>> Mark Robinson, PhD (Melb)
>> Epigenetics Laboratory, Garvan
>> Bioinformatics Division, WEHI
>> e: m.robinson at garvan.org.au
>> e: mrobinson at wehi.edu.au
>> p: +61 (0)3 9345 2628
>> f: +61 (0)3 9347 0852
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>
>
> -- 
> Vincent Carey, PhD
> Biostatistics, Channing Lab
> 617 525 2265

------------------------------
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robinson at garvan.org.au
e: mrobinson at wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852



More information about the Bioconductor mailing list