[BioC] Sc03b_MR_v04 CDF package

James W. MacDonald jmacdon at med.umich.edu
Fri Aug 22 17:00:30 CEST 2008


So I tracked this down, and it _doesn't_ appear to be a bug in 
pdInfoBuilder. It appears to be either an error or an inconsistency in 
the Affy bpmap files. I didn't use the yeast chip, as I am not sure I 
can get the one you used but I get the same error with the Arabidopsis 
bpmap so I used that one.

It all boils down to this:

 > library(affxparser)
 > tmp <- readBpmap("At35b_MF_v04-2_TIGRv5.bpmap", 26)
 > head(do.call("cbind",tmp[[1]][2:5]), n=20)
        pmx  pmy  mmx  mmy
  [1,] 2197 2194 2198 2194
  [2,] 2379  997 2380  997
  [3,] 2443 1111 2444 1111
  [4,] 2485  826 2486  826
  [5,] 2491 2374 2492 2374
  [6,] 2497  826 2498  826
  [7,] 2503 1602 2504 1602
  [8,] 2505 1129 2506 1129
  [9,] 2507 2022 2508 2022
[10,] 1687  169 1688  169
[11,] 1687  501 1688  501
[12,] 1687  638 1688  638
[13,] 1687  871 1688  871
[14,] 1687  873 1688  873
[15,] 1687 1007 1688 1007
[16,] 1687 1371 1688 1371
[17,] 1687 1492 1688 1492
[18,] 1687 2346 1688 2346 <-
[19,] 1687 2347 1688 2346 <-
[20,] 1689  287 1690  287

As you can see, for this QC probeset there are two MM probes that appear 
to be right on top of each other. I assume the second should really have 
an (1688, 2347) coordinate, but the bpmap is in error. Since this will 
make two identical indices which are used as the primary key for the 
table these data are being fed into we get an error as the primary key 
must be unique.

For the A. thaliana chip there are 36 such errors in the bpmap file for 
just the QC probes.

Best,

Jim



Ludo Muller wrote:
> Hi James,
> 
> Thank you for your help. I installed the pdInfoBuilder package locally, on
> my Win XP computer (I also requested it to be installed on a Linux server,
> in case my computer doesn't have enough memory), and I ran the following (I
> downloaded the appropriate .bpmap and .cif files):
> 
>> pkg <- new("AffyTilingPDInfoPkgSeed",version="0.0.1",author="Ludo
> Muller",email="ludo.muller
> at ...",biocViews="AnnotationData",genomebuild="Stanford Yeast Genome
> Database, October
> 2003",bpmapFile="Sc03b_MR_v04.bpmap",cifFile="Sc03b_MR_v04.cif")
>> makePdInfoPackage(pkg,destDir=".")
> 
> However, I get the following message:
> 
> Creating package in ./pd.sc03b.mr.v04
> Error in sqliteExecStatement(conn, statement, bind.data, ...) :
>   RS-DBI driver: (RS_SQLite_exec: could not execute: PRIMARY KEY must be
> unique)
> Timing stopped at: 6.35 0.17 6.59
> 
> I found an earlier report dealing with a similar error message:
> https://stat.ethz.ch/pipermail/bioconductor/2008-June/023080.html
> 
> Is it likely that the information files for this array are also inaccurate?
> 
> Cheers,
> 
> Ludo.
> 
> ---
> Ludo A.H. Muller, Ph.D.
> Dept. of Molecular Genetics & Microbiology
> Box 3020, Duke University Medical Center
> Durham, NC 27710, USA
> 
> Phone: +1 (919) 681-6781 or 681-6778
> Fax: +1 (919) 684-8735
> E-mail: ludo.muller at duke.edu
> Homepage: http://www.duke.edu/~mulle019
> 
> 
> 
> -----"James W. MacDonald" <jmacdon at med.umich.edu> wrote: -----
> 
> 
> To: Ludo Muller <ludo.muller at duke.edu>
> From: "James W. MacDonald" <jmacdon at med.umich.edu>
> Date: 08/19/2008 09:05AM
> cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] Sc03b_MR_v04 CDF package
> 
> Hi Ludo,
> 
> Ludo Muller wrote:
>> Hi all,
>>
>> I have data from hybridizations onto the Affymetrix yeast tiling array
>> (Sc03b_MR_v04) which I would like to analyze using Bioconductor. However
>> the CDF package (sc03bmrv04cdf) for this array doesn't seem to be
> available
>> from the bioconductor website. Can anybody tell me if it is available
>> elsewhere or whom I could contact about creating this package?
> 
> There will probably never be such a beast, as that implies the usage of
> the affy package. Instead, you should be using the oligo package and
> pdInfoBuilder. For that you will need the CIF and BPMAP files from Affy,
> and something like
> 
> pkg <- new("AffyTilingPDInfoPkgSeed", version = "0.0.1", author = "You",
> email = "you at yours.com", biocViews = "AnnotationData", genomebuild =
> "thegenomebuild", bpmapFile = <name of bpmap file>, cifFile = <name of
> cif file>)
> 
> makePdInfoPackage(pkg)
> 
> And then install using R CMD INSTALL <the package name>. You don't
> mention your OS, so that might simply entail running the above at a
> terminal prompt (if on Linux), or if you are on Windows or MacOS, you
> will need to get set up to build packages. See the R FAQ for either OS
> for further info about that.
> 
> Best,
> 
> Jim
> 
> 
>> Cheers,
>>
>> Ludo.
>>
>> ---
>> Ludo A.H. Muller, Ph.D.
>> Dept. of Molecular Genetics & Microbiology
>> Box 3020, Duke University Medical Center
>> Durham, NC 27710, USA
>>
>> Phone: +1 (919) 681-6781 or 681-6778
>> Fax: +1 (919) 684-8735
>> E-mail: ludo.muller at duke.edu
>> Homepage: http://www.duke.edu/~mulle019
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> --
> James W. MacDonald, M.S.
> Biostatistician
> Hildebrandt Lab
> 8220D MSRB III
> 1150 W. Medical Center Drive
> Ann Arbor MI 48109-0646
> 734-936-8662
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Hildebrandt Lab
8220D MSRB III
1150 W. Medical Center Drive
Ann Arbor MI 48109-0646
734-936-8662



More information about the Bioconductor mailing list