[BioC] Extracting pacient ID from AffyBatch objects

Sean Davis seandavi at gmail.com
Tue Apr 27 13:05:47 CEST 2010


On Tue, Apr 27, 2010 at 2:41 AM, Popa Tiberiu <popatiberiuo at yahoo.com> wrote:
> I have a set of 31 CEL files which i read into an AffyBatch object
>
>> myAB = ReadAffy()
>
>> sampleNames(myAB)
>  [1] "GSM2474.CEL.gz" "GSM2475.CEL.gz" "GSM2476.CEL.gz" "GSM2477.CEL.gz" "GSM2478.CEL.gz" "GSM2479.CEL.gz" "GSM2480.CEL.gz" "GSM2481.CEL.gz"
>  [9] "GSM2482.CEL.gz" "GSM2483.CEL.gz" "GSM2484.CEL.gz" "GSM2485.CEL.gz" "GSM2486.CEL.gz" "GSM2487.CEL.gz" "GSM2488.CEL.gz" "GSM2489.CEL.gz"
> [17] "GSM2490.CEL.gz" "GSM2491.CEL.gz" "GSM2492.CEL.gz" "GSM2493.CEL.gz" "GSM2494.CEL.gz" "GSM2495.CEL.gz" "GSM2496.CEL.gz" "GSM2497.CEL.gz"
> [25] "GSM2498.CEL.gz" "GSM2499.CEL.gz" "GSM2500.CEL.gz" "GSM2501.CEL.gz" "GSM2502.CEL.gz" "GSM2503.CEL.gz" "GSM2504.CEL.gz"
>
> I have a a CSV file containing some extra data for our samples.
>
>> disease= as.matrix(read.table("s12.csv", header=T, sep=",", row.names=1))
>> rownames(disease)
>  [1] "968-1"  "928-1"  "934-1"  "709-1"  "930-1"  "524-1"  "455-1"  "370-1"  "810-1"  "1146-1" "1161-1" "1006-1" "942-1"  "1060-1" "1255-1" "441-1"
> [17] "780-1"  "815-2"  "829-1"  "861-1"  "925-1"  "1008-1" "1086-1" "1105-1" "1145-1" "1327-1" "1352-1" "1379-1" "533-1"  "679-1"  "692-1"
>
> What i am trying to do is attach the extra data to the coresponding samples.
>
> Each CEL file contains this row in which its specified the pacients ID (709 in this case):
>
> ...
> DatHeader=[59..46191]  709  Ta gr2    ...
> ...
>
> Is there any way to get the sample ID list from the AffyBatch object?

Not a direct answer, but since these data are from NCBI GEO, why not
use GEOquery to get the information about samples?

 library(GEOquery)
Loading required package: Biobase

Welcome to Bioconductor

  Vignettes contain introductory material. To view, type
  'openVignette()'. To cite Bioconductor, see
  'citation("Biobase")' and for packages 'citation(pkgname)'.

Loading required package: RCurl
Loading required package: bitops
> gse <- getGEO('GSE88')[[1]]
Found 1 file(s)
GSE88_series_matrix.txt.gz
trying URL 'ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE88/GSE88_series_matrix.txt.gz'
ftp data connection made, file length 492380 bytes
opened URL
==================================================
downloaded 480 Kb

File stored at:
/var/folders/F+/F+PwkbXqF6WeunvinD8pZk+++TI/-Tmp-//Rtmp2M4lZS/GPL80.soft
> head(pData(gse))
                        title geo_accession                status
GSM2474  Bladder sample 709-1       GSM2474 Public on Dec 08 2002
GSM2475  Bladder sample 928-1       GSM2475 Public on Dec 08 2002
GSM2476  Bladder sample 930-1       GSM2476 Public on Dec 08 2002
GSM2477  Bladder tumour 934-1       GSM2477 Public on Dec 08 2002
GSM2478  Bladder sample 968-1       GSM2478 Public on Dec 08 2002
GSM2479 Bladder sample 1006-1       GSM2479 Public on Dec 08 2002

> sessionInfo()
R version 2.11.0 Under development (unstable) (2009-11-13 r50424)
i386-apple-darwin10.2.0

locale:
[1] en_US/en_US/C/C/en_US/en_US

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base

other attached packages:
[1] GEOquery_2.11.2 RCurl_1.3-1     bitops_1.0-4.1  Biobase_2.7.3



More information about the Bioconductor mailing list