[BioC] HuGene as exon array (was: xps rma() with HuGene-1_0-st-v1 on 64-bit architecture)

cstrato cstrato at aon.at
Tue Feb 24 20:50:26 CET 2009


Dear Tim,

I am glad to inform you that a new version of xps is now available from 
BioC (xps_1.2.6 and xps_1.3.6), and I would very  much appreciate if you 
could test the new version.

Please note that release 4 (r4) of the HuGene array converts it to an 
exon array, so you need to create the scheme as follows:

xps.scheme <- 
import.exon.scheme("Scheme_HuGene10stv1r4_na27_2",filedir=scmdir,
              
layoutfile=paste(libdir,"HuGene-1_0-st-v1.r4.analysis-lib-files/HuGene-1_0-st-v1.r4.clf",sep="/"),
              
schemefile=paste(libdir,"HuGene-1_0-st-v1.r4.analysis-lib-files/HuGene-1_0-st-v1.r4.pgf",sep="/"),
              
probeset=paste(anndir,"Version09Feb/HuGene-1_0-st-v1.na27.2.hg18.probeset.csv",sep="/"),
              
transcript=paste(anndir,"Version09Feb/HuGene-1_0-st-v1.na27.hg18.transcript.csv",sep="/"))


If you summarize the data on the transcript level you should get 
identical results as before:

xps.rma <- rma(xps.cel, "HuGeneMixRMAcore", background="antigenomic",
               option="transcript", exonlevel="core+affx")


In addition, you can now summarize the data on the probeset (exon) level:

xps.rma.ps <- rma(xps.cel, "HuGeneMixRMAcorePS", background="antigenomic",
                  option="probeset", exonlevel="core+affx")


Please let me know if the new version works as expected.

Best regards
Christian


Tim Rayner wrote:
> Dear Christian,
>
> Thank you very much for your help - reverting to the older r3 files
> does indeed solve the problem. I'll look forward to hearing about the
> new version of the xps package, and I'd be more than happy to help
> test it if needed.
>
> Best regards,
>
> Tim
>
> 2009/2/17 cstrato <cstrato at aon.at>:
>   
>> Dear Tim,
>>
>> First, I am glad to hear that my package works on 64-bit OS w/o problems.
>>
>> Luckily, the solution to your problem is simple. Please use the following
>> pgf and clf files in your code to create xps.scheme:
>> - HuGene-1_0-st-v1.r3.clf
>> - HuGene-1_0-st-v1.r3.pgf
>>
>> The reason is as follows:
>> About two weeks ago Affymetrix has updated the pgf file to allow customers
>> to use HuGene as a cheaper exon array. For this purpose, they have created
>> an additional "HuGene-1_0-st-v1.na27.hg18.probeset.csv" file and have
>> changed the probesets in the *.pgf file. Instead of "transcript_cluster_id"
>> the probes are now mapped to "probeset_id" of the new probeset annotation
>> file. For this reason xps recognizes only the 57 affx-controls when parsing
>> the *.pgf file, and thus only these 57 controls will be summarized.
>>
>> I am currently in the process to update my package to allow using HuGene
>> arrays as exon arrays, and I will inform you once I have uploaded the new
>> version. Until then I must ask you to use the older *.r3.pgf file.
>>
>> Best regards
>> Christian
>> _._._._._._._._._._._._._._._._._._
>> C.h.r.i.s.t.i.a.n   S.t.r.a.t.o.w.a
>> V.i.e.n.n.a           A.u.s.t.r.i.a
>> e.m.a.i.l:        cstrato at aon.at
>> _._._._._._._._._._._._._._._._._._
>>
>>
>> Tim Rayner wrote:
>>     
>>> Hi,
>>>
>>> I'm seeing what appears to be odd behaviour from the xps rma() method
>>> when trying to summarize a small test dataset from the
>>> HuGene-1_0-st-v1 array. The oddness is that whatever options I pass to
>>> rma(), I only ever get summary data for 57 probe sets back (obviously
>>> I'd expect rather more than that).
>>>
>>> I'm using 64-bit Mac OSX, and I believe I've installed everything
>>> correctly and imported the probe annotation from the latest chip
>>> library files on Affy's web site. I did have to compile ROOT from
>>> source to support the 64-bit architecture, but that went pretty
>>> smoothly. After some hours of poking through the xps code I'm a little
>>> suspicious about the probe masking, but not much wiser, I'm afraid.
>>>
>>> I should just briefly mention that I can run rma over the same data
>>> set by using the oligo package, so I think the data files are fine.
>>>
>>> Attached is a sample session, which I've just run from scratch to
>>> confirm the problem, and my sessionInfo. I'm wondering if anyone else
>>> has seen this, or if I've just made some fundamental error.
>>>
>>> Many thanks,
>>>
>>> Tim Rayner
>>>
>>>
>>>
>>> #############################################
>>> ## sessionInfo():
>>>
>>>
>>>       
>>>> sessionInfo()
>>>>
>>>>         
>>> R version 2.8.1 Patched (2009-01-19 r47650)
>>> i386-apple-darwin9.6.0
>>>
>>> locale:
>>> en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8
>>>
>>> attached base packages:
>>> [1] tools     stats     graphics  grDevices utils     datasets  methods
>>> [8] base
>>>
>>> other attached packages:
>>> [1] Biobase_2.2.2 xps_1.2.5
>>>
>>> loaded via a namespace (and not attached):
>>> [1] tcltk_2.8.1
>>>
>>>
>>>
>>> ##############################################
>>> ## Session commands:
>>> library('xps')
>>> celdir=getwd()
>>> celfiles=list.files(pattern='.*.CEL')
>>> libdir <- '/Users/tfr23/Documents/resources/HuGene-1_0/'
>>> xps.scheme <- import.genome.scheme(filename='HuGene-1_0-st-v1-r4',
>>>                                   filedir=libdir,
>>>                                   layoutfile=paste(libdir,
>>>                                     'HuGene-1_0-st-v1.r4.clf',
>>>                                     sep=''),
>>>                                   schemefile=paste(libdir,
>>>                                     'HuGene-1_0-st-v1.r4.pgf',
>>>                                     sep=''),
>>>                                   transcript=paste(libdir,
>>>
>>> 'HuGene-1_0-st-v1.na27.hg18.transcript.csv',
>>>                                     sep=''),
>>>                                   verbose=TRUE)
>>>
>>> xps.cel<-import.data(xps.scheme, 'HuGeneCelData', celdir=celdir,
>>> celfiles=celfiles)
>>>
>>> xps.cel<-attachInten(xps.cel)
>>>
>>> xps.rma <- rma(xps.cel,
>>>               filename='HuGeneMixRMAMetacore',
>>>               exonlevel='metacore+affx',
>>>               background='antigenomic',
>>>               normalize=TRUE)
>>>
>>> ######################################
>>> ## Session output:
>>>
>>> Welcome to xps version 1.2.5
>>>    an R wrapper for XPS - eXpression Profiling System
>>>    (c) Copyright 2001-2009 by Christian Stratowa
>>>
>>> Creating new file
>>> </Users/tfr23/Documents/resources/HuGene-1_0/HuGene-1_0-st-v1-r4.root>...
>>> Importing
>>> </Users/tfr23/Documents/resources/HuGene-1_0/HuGene-1_0-st-v1.r4.clf>
>>> as <HuGene-1_0-st-v1.cxy>...
>>>   <1102500> records imported...Finished
>>> New dataset <HuGene-1_0-st-v1> is added to Content...
>>> Importing
>>> </Users/tfr23/Documents/resources/HuGene-1_0/HuGene-1_0-st-v1.na27.hg18.transcript.csv>
>>> as <HuGene-1_0-st-v1.ann>...
>>>   Number of transcripts is <33297>.
>>>   <33297> records read...Finished
>>>   <33297> records imported...Finished
>>> Importing
>>> </Users/tfr23/Documents/resources/HuGene-1_0/HuGene-1_0-st-v1.r4.pgf>
>>> as <HuGene-1_0-st-v1.scm>...
>>>   Reading data from input file...
>>>   Number of probesets is <257430>.
>>> Note: Number of annotated probesets <33297> is not equal to number of
>>> probesets <257430>.
>>>   <257430> records read...Finished
>>>   Sorting data for probeset_type and position...
>>>   Total number of controls is <4371>
>>>   Note: no data for probeset type: control->chip...
>>>   Filling trees with data for probeset type: normgene, rescue...
>>>   Filling trees with data for probeset type: control->bgp...
>>>   Filling trees with data for probeset type: control->affx...
>>>   <33252> probeset tree entries read...Finished
>>>   Number of control->affx probesets is <57>.
>>>   Filling trees with data for probeset type: main...
>>>   Filling trees with data for non-annotated probesets...
>>>   <861493> records imported...Finished
>>>   <257430> total transcript units imported.
>>>   Genome cell statistics:
>>>      Number of unit cells: minimum = 1,  maximum = 1189
>>> Opening file
>>> </Users/tfr23/Documents/resources/HuGene-1_0/HuGene-1_0-st-v1-r4.root>
>>> in <READ> mode...
>>> Creating new file
>>> </Users/tfr23/Documents/affytest/HuGeneCelData_cel.root>...
>>> Importing </Users/tfr23/Documents/affytest/Affy 0104 - 020206A CD8 -
>>> 090213.CEL> as <Affy 0104 - 020206A CD8 - 090213.cel>...
>>>   hybridization statistics:
>>>      1 cells with minimal intensity 23
>>>      1 cells with maximal intensity 35735
>>> New dataset <DataSet> is added to Content...
>>> Importing </Users/tfr23/Documents/affytest/Affy 0104 - 020305 CD8 -
>>> 090213.CEL> as <Affy 0104 - 020305 CD8 - 090213.cel>...
>>>   hybridization statistics:
>>>      2 cells with minimal intensity 20
>>>      1 cells with maximal intensity 24768
>>> Importing </Users/tfr23/Documents/affytest/Affy 0104 - 030804 CD8 -
>>> 090213.CEL> as <Affy 0104 - 030804 CD8 - 090213.cel>...
>>>   hybridization statistics:
>>>      6 cells with minimal intensity 25
>>>      1 cells with maximal intensity 38526
>>> Importing </Users/tfr23/Documents/affytest/Affy 0104 - 040107 CD8 -
>>> 090213.CEL> as <Affy 0104 - 040107 CD8 - 090213.cel>...
>>>   hybridization statistics:
>>>      2 cells with minimal intensity 22
>>>      1 cells with maximal intensity 20150
>>> Importing </Users/tfr23/Documents/affytest/Affy 0104 - 061004 CD8 -
>>> 090213.CEL> as <Affy 0104 - 061004 CD8 - 090213.cel>...
>>>   hybridization statistics:
>>>      2 cells with minimal intensity 20
>>>      1 cells with maximal intensity 21650
>>> Importing </Users/tfr23/Documents/affytest/Affy 0104 - 070205 CD8 -
>>> 090213.CEL> as <Affy 0104 - 070205 CD8 - 090213.cel>...
>>>   hybridization statistics:
>>>      2 cells with minimal intensity 21
>>>      1 cells with maximal intensity 23005
>>> Importing </Users/tfr23/Documents/affytest/Affy 0104 - 090305 CD8 -
>>> 090213.CEL> as <Affy 0104 - 090305 CD8 - 090213.cel>...
>>>   hybridization statistics:
>>>      22 cells with minimal intensity 21
>>>      1 cells with maximal intensity 21205
>>> Importing </Users/tfr23/Documents/affytest/Affy 0104 - 110806B CD8 -
>>> 090213.CEL> as <Affy 0104 - 110806B CD8 - 090213.cel>...
>>>   hybridization statistics:
>>>      1 cells with minimal intensity 21
>>>      1 cells with maximal intensity 22958
>>> Importing </Users/tfr23/Documents/affytest/Affy 0104 - 150107 CD8 -
>>> 090213.CEL> as <Affy 0104 - 150107 CD8 - 090213.cel>...
>>>   hybridization statistics:
>>>      2 cells with minimal intensity 19
>>>      1 cells with maximal intensity 23606
>>> Importing </Users/tfr23/Documents/affytest/Affy 0104 - 150405 CD8 -
>>> 090213.CEL> as <Affy 0104 - 150405 CD8 - 090213.cel>...
>>>   hybridization statistics:
>>>      4 cells with minimal intensity 24
>>>      1 cells with maximal intensity 24268
>>> Importing </Users/tfr23/Documents/affytest/Affy 0104 - 190706 CD8 -
>>> 090213.CEL> as <Affy 0104 - 190706 CD8 - 090213.cel>...
>>>   hybridization statistics:
>>>      6 cells with minimal intensity 21
>>>      1 cells with maximal intensity 22769
>>> Importing </Users/tfr23/Documents/affytest/Affy 0104 - 300605 CD8 -
>>> 090213.CEL> as <Affy 0104 - 300605 CD8 - 090213.cel>...
>>>   hybridization statistics:
>>>      2 cells with minimal intensity 20
>>>      1 cells with maximal intensity 22309
>>> Importing </Users/tfr23/Documents/affytest/Affy 0104 -040205 CD8 -
>>> 090213.CEL> as <Affy 0104 -040205 CD8 - 090213.cel>...
>>>   hybridization statistics:
>>>      1 cells with minimal intensity 23
>>>      1 cells with maximal intensity 22497
>>> Creating new file
>>> </Users/tfr23/Documents/affytest/HuGeneMixRMAMetacore.root>...
>>> Opening file
>>> </Users/tfr23/Documents/resources/HuGene-1_0/HuGene-1_0-st-v1-r4.root>
>>> in <READ> mode...
>>> Preprocessing data using method <preprocess>...
>>>   Background correcting raw data...
>>>      setting selector mask for typepm <8252>
>>>      calculating background for <Affy 0104 - 020206A CD8 - 090213.cel>...
>>>      background statistics:
>>>         1097995 cells with minimal intensity 0
>>>         1378 cells with maximal intensity 151.284
>>>      calculating background for <Affy 0104 - 020305 CD8 - 090213.cel>...
>>>      background statistics:
>>>         1097995 cells with minimal intensity 0
>>>         2 cells with maximal intensity 75.9992
>>>      calculating background for <Affy 0104 - 030804 CD8 - 090213.cel>...
>>>      background statistics:
>>>         1097995 cells with minimal intensity 0
>>>         28 cells with maximal intensity 122.454
>>>      calculating background for <Affy 0104 - 040107 CD8 - 090213.cel>...
>>>      background statistics:
>>>         1097995 cells with minimal intensity 0
>>>         13 cells with maximal intensity 154.02
>>>      calculating background for <Affy 0104 - 061004 CD8 - 090213.cel>...
>>>      background statistics:
>>>         1097995 cells with minimal intensity 0
>>>         47 cells with maximal intensity 101.165
>>>      calculating background for <Affy 0104 - 070205 CD8 - 090213.cel>...
>>>      background statistics:
>>>         1097995 cells with minimal intensity 0
>>>         25 cells with maximal intensity 94.408
>>>      calculating background for <Affy 0104 - 090305 CD8 - 090213.cel>...
>>>      background statistics:
>>>         1097995 cells with minimal intensity 0
>>>         220 cells with maximal intensity 52.9483
>>>      calculating background for <Affy 0104 - 110806B CD8 - 090213.cel>...
>>>      background statistics:
>>>         1097995 cells with minimal intensity 0
>>>         97 cells with maximal intensity 136.739
>>>      calculating background for <Affy 0104 - 150107 CD8 - 090213.cel>...
>>>      background statistics:
>>>         1097995 cells with minimal intensity 0
>>>         1055 cells with maximal intensity 105.265
>>>      calculating background for <Affy 0104 - 150405 CD8 - 090213.cel>...
>>>      background statistics:
>>>         1097995 cells with minimal intensity 0
>>>         36 cells with maximal intensity 128.385
>>>      calculating background for <Affy 0104 - 190706 CD8 - 090213.cel>...
>>>      background statistics:
>>>         1097995 cells with minimal intensity 0
>>>         957 cells with maximal intensity 135.396
>>>      calculating background for <Affy 0104 - 300605 CD8 - 090213.cel>...
>>>      background statistics:
>>>         1097995 cells with minimal intensity 0
>>>         865 cells with maximal intensity 49.4309
>>>      calculating background for <Affy 0104 -040205 CD8 - 090213.cel>...
>>>      background statistics:
>>>         1097995 cells with minimal intensity 0
>>>         650 cells with maximal intensity 140.053
>>>   Normalizing raw data...
>>>      normalizing data using method <quantile>...
>>>      setting selector mask for typepm <8252>
>>>         finished filling <13> arrays.           90213>...
>>>         finished filling <13> trees.          090213.cqu>...
>>>   Converting raw data to expression levels...
>>>      summarizing with <medianpolish>...
>>>      setting selector mask for typepm <8252>
>>>      setting selector mask for typepm <8252>
>>>      calculating expression for <57> of <257430> units...Finished.
>>>      expression statistics:
>>>         minimal expression level is <19.8498>
>>>         maximal expression level is <8953.24>
>>>   preprocessing finished.
>>> Opening file
>>> </Users/tfr23/Documents/resources/HuGene-1_0/HuGene-1_0-st-v1-r4.root>
>>> in <READ> mode...
>>> Opening file </Users/tfr23/Documents/affytest/HuGeneMixRMAMetacore.root>
>>> in <READ> mode...
>>> Exporting data from tree <*> to file
>>> </Users/tfr23/Documents/affytest/HuGeneMixRMAMetacore.txt>...
>>> Reading entries from <HuGene-1_0-st-v1.ann> ...Finished
>>> <57> of <57> records exported.
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>>
>>>
>>>       
>>     
>
>
>



More information about the Bioconductor mailing list