[BioC] pd.hugene.1.0.st.v1

Carvalho, Benilton bcarvalh at jhsph.edu
Sat Aug 1 01:10:00 CEST 2009


Mark, I'm planning on providing an updated version of thhe annotation
pkgs that will allow gene-level summarization in about 1 week (maybe
earlier). b


--
Sent from my iPhone

On Jul 31, 2009, at 7:20 PM, "Mark Robinson" <mrobinson at wehi.EDU.AU>
wrote:

> Hi Vince.
>
> Thanks for the reply.
>
> That's good to know.  But, it only allows me to access the indices,
> not to actually compute gene-level summaries, right? Any way to do
> that without building the package from scratch?
>
> Cheers,
> Mark
>
> On 31/07/2009, at 10:10 PM, Vincent Carey wrote:
>
>> On Fri, Jul 31, 2009 at 12:48 AM, Mark
>> Robinson<mrobinson at wehi.edu.au> wrote:
>>> Hi all.
>>>
>>> I wonder if its makes more sense to have the *transcript* version
>>> of this
>>> package, instead of the *probeset* version available when you
>>> install via:
>>>
>>
>> This merits further discussion.  Note that under the current approach
>> you can obtain
>> the transcript cluster indices for summarization using fData on the
>> output of rma
>>
>>> class(tismix)
>> [1] "GeneFeatureSet"
>> attr(,"package")
>> [1] "oligoClasses"
>>> class(tismixRMA)
>> [1] "ExpressionSet"
>> attr(,"package")
>> [1] "Biobase"
>>> fData(tismixRMA)[1:4,]
>>        fsetid  exon_id transcript_cluster_id level crosshyb_type
>> chrom
>> 7896737 7896737 96595542               7896736    NA
>> 3     1
>> 7896739 7896739 96595544               7896738    NA
>> 3     1
>> 7896741 7896741 96595546               7896740    NA
>> 3     1
>> 7896743 7896743 96595548               7896742    NA
>> 3     1
>>
>>                     accessions
>> 7896737
>>                           <NA>
>> 7896739
>>                           <NA>
>> 7896741
>> BC136848
>> ,BC136907,ENST00000318050,ENST00000326183,ENST00000335137,NM_001
>> 004195,NM_001005240,NM_001005484
>> 7896743
>>       BC118988,ENST00000279067
>>
>>> dim(fData(tismixRMA))
>> [1] 253002      7
>>> dim(exprs(tismixRMA))
>> [1] 253002     33
>>
>> annotation packages are available at both the probescript and
>> transcript cluster level, thanks
>> to folks at city of hope (e.g.,
>> http://www.bioconductor.org/packages/release/data/annotation/html/hugene10sttranscriptcluster.db.html
>> )
>>
>>
>>> source("http://bioconductor.org/biocLite.R")
>>> biocLite("pd.hugene.1.0.st.v1")
>>>
>>> It seems like as a default, more people would want gene-level
>>> summaries for
>>> these arrays ... especially since ~200k (~80%) of the probesets
>>> have 3
>>> probes or less.
>>>
>>> Of course I (and everyone around the world) could build this
>>> package locally
>>> from scratch using the transcript CSV, but it seems like there
>>> would be
>>> enough demand for this to make available direct from BioC.  Just a
>>> thought.
>>> Does anyone agree?
>>>
>>> Or, am I missing something that will allow me to do gene-level
>>> analysis from
>>> this package?
>>>
>>> My session is below.
>>>
>>> Thanks in advance.
>>> Mark
>>>
>>>
>>>
>>> ----------------------
>>> mac1618:Desktop mrobinson$ wc -l HuGene-1_0-st-v1.na29.*.csv
>>> 257449 HuGene-1_0-st-v1.na29.hg18.probeset.csv
>>>  33317 HuGene-1_0-st-v1.na29.hg18.transcript.csv
>>> ----------------------
>>>
>>>
>>> ----------------------
>>>> library(oligo)
>>> Loading required package: oligoClasses
>>> Loading required package: Biobase
>>>
>>> Welcome to Bioconductor
>>>
>>> Vignettes contain introductory material. To view, type
>>> 'openVignette()'. To cite Bioconductor, see
>>> 'citation("Biobase")' and for packages 'citation(pkgname)'.
>>>
>>> Loading required package: preprocessCore
>>> Welcome to oligo version 1.8.1
>>>> cf <- dir(celPath,"CEL")
>>>> fs <- read.celfiles( file.path(celPath,cf) )
>>> Loading required package: pd.hugene.1.0.st.v1
>>> Loading required package: RSQLite
>>> Loading required package: DBI
>>> Platform design info loaded.
>>> Reading in : rawData/cell_line/HuGene-1_0-st-v1//cancer1.CEL
>>> Reading in : rawData/cell_line/HuGene-1_0-st-v1//cancer2.CEL
>>> Reading in : rawData/cell_line/HuGene-1_0-st-v1//normal1.CEL
>>> Reading in : rawData/cell_line/HuGene-1_0-st-v1//normal2.CEL
>>>> rmaOligo <- oligo::rma(fs)
>>> Background correcting
>>> Normalizing
>>> Calculating Expression
>>> dmOligo <- exprs(rmaOligo)
>>> dim(rmaOligo)
>>>> dmOligo <- exprs(rmaOligo)
>>>> dim(rmaOligo)
>>> Features  Samples
>>> 253002        4
>>>> sessionInfo()
>>> R version 2.9.0 (2009-04-17)
>>> i386-apple-darwin8.11.1
>>>
>>> locale:
>>> en_AU.UTF-8/en_AU.UTF-8/C/C/en_AU.UTF-8/en_AU.UTF-8
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> other attached packages:
>>> [1] pd.hugene.1.0.st.v1_2.4.1 RSQLite_0.7-1
>>> [3] DBI_0.2-4                 oligo_1.8.1
>>> [5] preprocessCore_1.6.0      oligoClasses_1.6.0
>>> [7] Biobase_2.4.1
>>>
>>> loaded via a namespace (and not attached):
>>> [1] affxparser_1.15.6 affyio_1.12.0     Biostrings_2.12.1
>>> IRanges_1.2.2
>>> [5] splines_2.9.0
>>> ----------------------
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ------------------------------
>>> Mark Robinson, PhD (Melb)
>>> Epigenetics Laboratory, Garvan
>>> Bioinformatics Division, WEHI
>>> e: m.robinson at garvan.org.au
>>> e: mrobinson at wehi.edu.au
>>> p: +61 (0)3 9345 2628
>>> f: +61 (0)3 9347 0852
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>>
>>
>> --
>> Vincent Carey, PhD
>> Biostatistics, Channing Lab
>> 617 525 2265
>
> ------------------------------
> Mark Robinson, PhD (Melb)
> Epigenetics Laboratory, Garvan
> Bioinformatics Division, WEHI
> e: m.robinson at garvan.org.au
> e: mrobinson at wehi.edu.au
> p: +61 (0)3 9345 2628
> f: +61 (0)3 9347 0852
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list