[BioC] pd.mapping 10Karray

Tobias Verbeke tobias.verbeke at telenet.be
Thu Aug 16 22:58:51 CEST 2007


Hi Seth,

Thanks for your response.
I had the time to look a bit
further into this.

Seth Falcon wrote:

> Hi Marianne,
> 
> I'm not yet sure what is going on with pdInfoBuilder, but perhaps we
> can sort it out...
> 
> "Marianne Tuefferd" <tuefferd at vjf.inserm.fr> writes:
>>  > library("pdInfoBuilder")
>>  > pkg <- new("AffySNPPDInfoPkgSeed",
>> +            version = "0.0",
>> +            email = "tuefferd at vjf.inserm.fr",
>> +            biocViews = "AnnotationData",
>> +            cdfFile = cdfFile,
>> +            csvAnnoFile = csvAnno,
>> +            csvSeqFile = csvSeq)
>>  > makePdInfoPackage(pkg, destDir = ".")
>> Creating package in ./pd.mapping10k.xba142
>> Error in gsub(pattern, replacement, x, ignore.case, extended, fixed,
>> useBytes) :
>>     invalid argument
>>  > traceback()
>> 7: gsub(nm[i], symbolValues[[i]], res)
>> 6: subsFileName(tmp[length(tmp)])
>> 5: cpSubs(src, dest)
>> 4: copySubstitute(dir(originDir, full.names = TRUE), pkgdir, symbolValues,
>>         recursive = TRUE)
>> 3: createPackage(pkgname = pkgName, destinationDir = destDir, originDir
>> = templateDir,
>>         symbolValues = syms, quiet = quiet)
>> 2: makePdInfoPackage(pkg, destDir = ".")
>> 1: makePdInfoPackage(pkg, destDir = ".")

It appears the cause is that the author and genomebuild
field are empty. It might be a good idea to check for
this or enforce the presence of these fields.

However, along the way, we discovered other issues.
For example, in the loadAffyCsv function (loaders.R),
there is a selection of columns based on column number
that is not appropriate for the 10k files:

This is the relevant snippet:

   wantedCols <- c(1,2,3,4,7,8,10,12,13,14,17)
                                         # added 10/14
   df <- read.table(con, sep=",", stringsAsFactors=FALSE, nrows=10,
                    na.strings="---", header=TRUE)[, wantedCols]

To match the needed columns for 10k files, the numbers 5, 6 and 15 are
needed as well. It might however be a better idea to just read in
the header and match on a character vector with prespecified names
to determine the wanted columns (before reading in the rest for real).

Once this problem is solved, the function runs fine. There is however 
another error message in the loadAffySeqCsv
file

t <- ST(loadAffySeqCsv(db, csvSeqFile, cdfFile, batch_size=batch_size))

Error in sqliteExecStatement(con, statement, bind.data) :
	RS-DBI driver: (RS_SQLite_exec: could not execute: PRIMARY KEY must be
unique)
Timing stopped at: 0.58 0.05 0.73 NA NA

traceback()
9: .Call("RS_SQLite_exec", conId, statement, bind.data, PACKAGE =
.SQLitePkgName)
8: sqliteExecStatement(con, statement, bind.data)
7: sqliteQuickSQL(conn, statement, bind.data, ...)
6: dbGetPreparedQuery(db, sql, bind.data = mmdf)
5: dbGetPreparedQuery(db, sql, bind.data = mmdf)
4: loadAffySeqCsv(db, csvSeqFile, cdfFile, batch_size = batch_size)
3: eval(expr, envir, enclos)
2: eval(expr, envir = loc.frame)
1: ST(loadAffySeqCsv(db, csvSeqFile, cdfFile, batch_size = batch_size))

I will try to track this down as well, but if anyone
recognizes this kind of problem, I would be most grateful
for a pointer.

Kind regards,
Tobias

> 
> This is useful output.  Can you try two things?
> 
> 1. Try setting options(error=recover) and then rerun the above
>    example.  When the error occurs you will be put into the debugger
>    and can select a frame to enter (numbered like the stack trace
>    above).  Find the frame with the gsub call, and print out the value
>    of nm[i], sumbolValues[[i]], res.  Since the error is telling us
>    that one of these is not somehow valid
> 
> 2. I notice that your locale setting is not "C" and I wonder if
>    rerunning the example after setting Sys.setlocale(locale="C")
>    changes anything.
> 
> + seth
> 

-- 

Tobias Verbeke - Consultant
Business & Decision Benelux
Rue de la révolution 8
1000 Brussels - BELGIUM

+32 499 36 33 15
tobias.verbeke at businessdecision.com



More information about the Bioconductor mailing list