[BioC] makePdInfoPackage in preparation for RMA with oligo on Nimblegen Expression Arrays

Wed Jul 15 00:05:02 CEST 2009

> xys <- read.delim(xysFile, comment='#', nrow=3)
> str(xys)
'data.frame':   3 obs. of  4 variables:
 $ X     : int  209 228 43
 $ Y     : int  203 52 257
 $ SIGNAL: num  203 146 159
 $ COUNT : int  1 1 1

-----Original Message-----
From: Benilton Carvalho [mailto:bcarvalh at jhsph.edu] 
Sent: Tuesday, July 14, 2009 3:03 PM
To: Jack Schonbrun
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] makePdInfoPackage in preparation for RMA with oligo on Nimblegen Expression Arrays

how about?

xys <- read.delim(xysFile, comment="#", nrow=100)
str(xys)

b

On Jul 14, 2009, at 6:58 PM, Jack Schonbrun wrote:

> Here's what I get:
>
>> ndf <- read.delim(ndfFile, stringsAsFactors=FALSE, nrow=100)
>> str(ndf)
> 'data.frame':   100 obs. of  17 variables:
> $ PROBE_DESIGN_ID   : chr  "6531_0301_0005" "6531_0311_0005"  
> "6531_0331_0005" "6531_0333_0005" ...
> $ CONTAINER         : chr  "SACCHAROMYCES1" "SACCHAROMYCES1"  
> "NGS_CONTROLS" "NGS_CONTROLS" ...
> $ DESIGN_NOTE       : chr  "rank_selected" "rank_selected" "upper  
> right fiducial" "" ...
> $ SELECTION_CRITERIA: chr  "rank:03;score:379;uniq:14;count:37;freq: 
> 01;rules:1;tm:82.4" "rank:05;score:046;uniq:14;count:1110;freq: 
> 30;rules:1;tm:78.3" "bright" "" ...
> $ SEQ_ID            : chr  "SCER070900001885" "SCER070900001596"  
> "FIDUCIAL_UPPER_RIGHT" "CROSSHYBE" ...
> $ PROBE_SEQUENCE    : chr   
> "GTCAACCCTGCAAGATCTCTGGGTGCCGCCGTTGCTGCCAGATATTTCCCTCATTACCAC"  
> "TCAGTTGGAACGCCTCTGAGCACTCCATCACCTGAGTCAGGTAATACATTTACTGATTCA"  
> "TGAGTTGTTTGATAGGATTATTCATAGAGGTCATTACAGCGAGAGGAANNNNNNNNN"  
> "CGATGCGACGCGAACTAAGCAGTTCGGCGCAGTCGACTAGTATAACAGNNNNNNNN" ...
> $ MISMATCH          : int  0 0 0 0 0 0 0 0 0 0 ...
> $ MATCH_INDEX       : int  72062965 72061238 2000207 70654015  
> 70652179 65069272 65069273 65069274 65069275 65069276 ...
> $ FEATURE_ID        : int  72062965 72061238 71722817 71722819  
> 71722820 71722824 71722825 71722826 71722827 71722828 ...
> $ ROW_NUM           : int  5 5 5 5 6 6 6 6 6 6 ...
> $ COL_NUM           : int  301 311 331 333 1 5 6 7 8 9 ...
> $ PROBE_CLASS       : chr  "experimental" "experimental" "fiducial"  
> "control:crosshybe" ...
> $ PROBE_ID          : chr  "SCER070900001885P00271"  
> "SCER070900001596P00406" "CPK6" "XENOTRACK48P02" ...
> $ POSITION          : int  271 406 0 2 0 0 5 0 6 0 ...
> $ DESIGN_ID         : int  6531 6531 6531 6531 6531 6531 6531 6531  
> 6531 6531 ...
> $ X                 : int  301 311 331 333 1 5 6 7 8 9 ...
> $ Y                 : int  5 5 5 5 6 6 6 6 6 6 ...
>>
>
> -----Original Message-----
> From: Benilton Carvalho [mailto:bcarvalh at jhsph.edu]
> Sent: Tuesday, July 14, 2009 2:56 PM
> To: Jack Schonbrun
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] makePdInfoPackage in preparation for RMA with  
> oligo on Nimblegen Expression Arrays
>
> what do you get if you run the following (assuming ndfFile is a
> variable has the file name)?
>
> ndf <- read.delim(ndfFile, stringsAsFactors=FALSE, nrows=100)
> str(ndf)
>
> thanks,
>
> b
>
> On Jul 14, 2009, at 6:49 PM, Jack Schonbrun wrote:
>
>> Benilton,
>>
>> Thanks for your suggestions.
>>
>> By every means I have tested, the file is tab delimited.  And the
>> first row is headers, all other data.
>>
>> Here is how the first (header) row looks:
>> PROBE_DESIGN_ID CONTAINER       DESIGN_NOTE
>> SELECTION_CRITERIA      SEQ_ID  PROBE_SEQUENCE  MISMATCH
>> MATCH_INDEX     FEATURE_ID      ROW_NUM COL_NUM PROBE_CLASS
>> PROBE_ID        POSITION        DESIGN_ID       X       Y
>>
>> Any other details on how the ndf is expected to look?
>>
>> Thanks again,
>> Jack
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Benilton Carvalho [mailto:bcarvalh at jhsph.edu]
>> Sent: Tuesday, July 14, 2009 1:34 PM
>> To: Jack Schonbrun
>> Cc: bioconductor at stat.math.ethz.ch
>> Subject: Re: [BioC] makePdInfoPackage in preparation for RMA with
>> oligo on Nimblegen Expression Arrays
>>
>> Jack,
>>
>> it looks like your NDF isn't as expected.
>>
>> When it shows: "inserting 0 rows into table 'featureSet'", it makes  
>> me
>> wonder how the SEQ_ID column in the NDF looks like.
>>
>> But, instead of looking at the columns' contents right now, please
>> make sure the delimiters of the NDF are tabs. It doesn't appear  
>> that's
>> the case. Note the warning "In max(ndfdata[["X"]]): no non-missing
>> arguments to max; returning -Inf"... It suggests that ndfdata[["X"]]
>> is NULL.
>>
>> Another thing: ensure the first line of the NDF is the header (column
>> names) and the data start on the 2nd line.
>>
>> PLease let me know how it goes.
>>
>> b
>>
>> On Jul 14, 2009, at 3:57 PM, Jack Schonbrun wrote:
>>
>>> Hello,
>>>
>>> I would like to use the oligo package to run the RMA algorithm on
>>> Nimblegen expression arrays.  To that end, I am attempting to
>>> construct an annotation package using makePdInfoPackage().
>>>
>>> I have followed the pattern in the "Building Annotation Packages
>>> with pdInfoBuilder
>>> for Use with the oligo Package" vignette:
>>>
>>> ----------------
>>>
>>>> ndfFile.test <- "test.ndf"
>>>> xysFile.test <- "test.xys"
>>>> seed.test <- new("NgsExpressionPDInfoPkgSeed", ndfFile =
>>>> ndfFile.test, xysFile = xysFile.test)
>>>> makePdInfoPackage(seed.test, destDir = "./Annotation")
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> = 
>>> ====================================================================
>>> Building annotation package for Nimblegen Expression Array
>>> NDF:  test.ndf
>>> XYS:  test.xys
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> = 
>>> ====================================================================
>>> Parsing file: test.ndf ... OK
>>> Parsing file: test.xys ... OK
>>> Merging NDF and XYS files ...OK
>>> Preparing contents for featureSet table ...OK
>>> Preparing contents for bgfeature table ...OK
>>> Preparing contents for pmfeature table ...OK
>>> Creating package in ./Annotation/pd.test
>>> Inserting 0 rows into table "featureSet"... Error in
>>> sqliteExecStatement(con, statement, bind.data) :
>>> RS-DBI driver: (incomplete data binding: expected 2 parameters, got
>>> 0)
>>> In addition: Warning messages:
>>> 1: In max(ndfdata[["Y"]]) :
>>> no non-missing arguments to max; returning -Inf
>>> 2: In max(ndfdata[["X"]]) :
>>> no non-missing arguments to max; returning -Inf
>>> 3: In sqliteExecStatement(con, statement, bind.data) :
>>> ignoring zero-row bind.data
>>>
>>> ------------------
>>>
>>> Any help on why it would only be inserting 0 rows, or any of the
>>> other messages would be greatly appreciated.  It does make some
>>> files in the destDir, but does not run to completion.  Listing of
>>> this directory available if it would help.
>>>
>>> I am running on Windows XP SP 2.  sessionInfo follows.
>>>
>>>> sessionInfo()
>>> R version 2.9.1 (2009-06-26)
>>> i386-pc-mingw32
>>>
>>> locale:
>>> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United  
>>> States.
>>> 1252;LC_MONETARY=English_United States.
>>> 1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> other attached packages:
>>> [1] pdInfoBuilder_1.8.1      affxparser_1.16.0
>>> RSQLite_0.7-1            DBI_0.2-4
>>> makePlatformDesign_1.8.0 oligo_1.8.1
>>> [7] preprocessCore_1.6.0     oligoClasses_1.6.0
>>> Biobase_2.4.1            affyio_1.12.0
>>>
>>> loaded via a namespace (and not attached):
>>> [1] Biostrings_2.12.7 IRanges_1.2.3     splines_2.9.1      
>>> tools_2.9.1
>>>
>>>
>>> ===========================
>>> Jack Schonbrun Ph.D.
>>> Software Developer
>>> Amyris Biotech
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>