[BioC] error in matchprobes package

Zhijin Wu zwu at stat.brown.edu
Fri Sep 14 16:00:39 CEST 2007


I think Saroj is right. The cdf package gives only 153525 pm index, but 
there are 158334 probe sequences. So there are some probes with sequence 
but the cdf package doesn't have their location.
GCRMA is able to handle the situation that sequence for some probes are 
not given (since in the past there are incomplete probe sequence files), 
but it expects no more probe sequence than those given in cdf pkg.

I could adjust gcrma to take the intersection of what cdf and probe 
packages have in common, but I wonder if this discrepancy between cdf 
and probe package is something we would expect.

pmIndex <- unlist(indexProbes(new("AffyBatch", cdfName = 
"ehis1a520285f"),"pm"))
length(pmIndex)
[1] 153525

p=get("ehis1a520285fprobe")$sequence
 > length(p)
[1] 158334




Yan Zhang wrote:
> Saroj:
> 
> I think you are talking about is an old issue.
> I already edited probesequence file and removed those mismatched 
> probeset ID. Right now, they are matched to each other. Then, I used  
> matched CDF and probesequence files to generate packages using R2.6 
> alpha. When I run GCRMA, I still got this error message.
> 
> Adjusting for optical effect..Done.
> Computing affinities.Error in tmp.exprs[pmIndex[subIndex]] = apm :
>  NAs are not allowed in subscripted assignments
> 
> That is the reason that I sent e-mail to Dr. Wu and bioconductor list 
> asked for further help.
> 
> We could discuss this issue sometime today.
> 
> best
> yan
> 
> smohapat at vbi.vt.edu wrote:
> 
>> Jim and all,
>>
>> I have been following the messages online and had a chance to talk with
>> Yan and look at the error messages yesterday.
>>
>> I think the problem is caused by a discrepancy between the cdf and
>> probeseq files that Yan received from his collaborator. As I understand
>> number of probeset id in probesequence file is more than that in CDF 
>> file.
>> 614 probeset ID could not be found in CDF file. Yan, please correct me if
>> I am wrong.
>>
>> I am guessing that matchprobes adds NAs for the ids missing in the CDF 
>> and
>> this causes the error during gcrma.
>>
>> Best,
>>
>> Saroj
>>
>> On Thu, September 13, 2007 11:38 am, James W. MacDonald wrote:
>>  
>>
>>> Hi Yan,
>>>
>>>
>>> I have no idea why you were having problems with this, unless you didn't
>>> upgrade to R-devel like I suggested. I didn't have any problems building
>>> this package.
>>>
>>> Rather than trying to talk you through building this yourself, I have
>>> put it up for download:
>>>
>>> http://www.umich.edu/~jmacdon/ehis1a520285fprobe_0.0.1.tar.gz
>>>
>>>
>>> Best,
>>>
>>>
>>> Jim
>>>
>>>
>>>
>>>
>>> yzhang at vbi.vt.edu wrote:
>>>   
>>>> Jim:
>>>>
>>>>
>>>> I put my cdf and probesequence file and one cel file at the following
>>>> url. if you are willing to repeat my problem, you could download them
>>>> and try in your machine. http://ci.vbi.vt.edu/yan/newcdf/huber.html
>>>> Thanks a lot.
>>>> yan
>>>>
>>>>
>>>> On Wed, September 12, 2007 4:01 pm, James W. MacDonald wrote:
>>>>
>>>>
>>>>     
>>>>> Yan Zhang wrote:
>>>>>
>>>>>
>>>>>       
>>>>>> jim:
>>>>>>
>>>>>>
>>>>>>
>>>>>> I am wrong. That chip did have MM. I just checked it using mm
>>>>>> function in affy package. The reason that I think it is only has pm
>>>>>> is because only pm in probesequence file.  Then, do you have some
>>>>>> suggestion to solve that error message?
>>>>>>         
>>>>> Sure. You have two choices. You can add comparewithcdf=FALSE to your
>>>>> call to makeProbePackage(), which will eliminate the warnings because
>>>>> you will no longer be comparing to the cdf. This is the simplest
>>>>> answer, but regrettably the most dangerous as well.
>>>>>
>>>>> Otherwise, you could
>>>>>
>>>>>
>>>>>
>>>>> debug(.lgExtraParanoia)
>>>>>
>>>>> before running makeProbePackage(), and then step through that
>>>>> function, looking at what you get for pm1, mm1, pm2, and mm2 to see
>>>>> why you are getting the error in the first place. I have to assume one
>>>>> of those variables is ending up as an NA (usually this happens because
>>>>> there aren't any MMs). Then you will have to figure out what to do
>>>>> with this information.
>>>>>
>>>>> Best,
>>>>>
>>>>>
>>>>>
>>>>> Jim
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>       
>>>>>> best yan
>>>>>>
>>>>>> James W. MacDonald wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>         
>>>>>>> Hi Yan,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> First, please don't take things off-list. The archives are
>>>>>>> intended to be a resource, and if the questions/answers become
>>>>>>> private then we have less of a resource.
>>>>>>>
>>>>>>> Yan Zhang wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>>>> Thank you very much for your response.
>>>>>>>> Yes, that chip only has PM. Then, what can I do?
>>>>>>>> I need to solve this problem in order to continue.
>>>>>>>> For warning message,
>>>>>>>> Can I just ignore that warning messages? I doubled. Because
>>>>>>>> later, when I using GCRMA, those NA will cause trouble in the
>>>>>>>> compute.infinite function. What can I do? Can I just delete the
>>>>>>>> head of probesequence file?
>>>>>>>>             
>>>>>>> You won't be able to do GCRMA with a PM-only chip. GCRMA uses the
>>>>>>> MM
>>>>>>> probes to compute a background estimate, and if you don't have MM
>>>>>>> probes you won't be able to do that.
>>>>>>>
>>>>>>> As for the second question (which is a moot point now), you don't
>>>>>>> want to delete the head of the probe_tab file. As I mentioned in
>>>>>>> my earlier reply you would need to use the devel version of
>>>>>>> matchprobes with R-2.6.0alpha.
>>>>>>>
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Jim
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>>>> best yan
>>>>>>>>
>>>>>>>> James W. MacDonald wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>             
>>>>>>>>> Hi Yan,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> yzhang at vbi.vt.edu wrote:
>>>>>>>>>
>>>>>>>>>               
>>>>>>>>>> When I use makeProbePackage function in newest version
>>>>>>>>>> matchprobes package(1.8.1), I got the following error
>>>>>>>>>> message:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                 
>>>>>>>>>>> makeProbePackage("ehis1a520285f",version="1.0",species="e
>>>>>>>>>>> his"
>>>>>>>>>>> ,maintainer="yanzhang<yzhang at vbi.vt.edu>",build=FALSE,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                   
>>>>>>>>>> check=FALSE, force=True) Importing the data. Error in
>>>>>>>>>> rep(NA, max(pm1, mm1, pm2, mm2)) : invalid 'times' argument
>>>>>>>>>> In addition: Warning messages:
>>>>>>>>>> 1: NAs introduced by coercion in:
>>>>>>>>>> as.integer.default(dat[[2]]) 2: NAs introduced by coercion
>>>>>>>>>> in: as.integer.default(dat[[3]])
>>>>>>>>>> 3: NAs introduced by coercion in:
>>>>>>>>>> as.integer.default(dat[[4]])
>>>>>>>>>>
>>>>>>>>>>                 
>>>>>>>>> The error comes from code that compares the probeset IDs from
>>>>>>>>> the probe package with the cdf package, and IIRC this happens
>>>>>>>>> when you have a PM-only chip. Is this chip PM-only?
>>>>>>>>>
>>>>>>>>> The warnings come from an unfortunate change that was made to
>>>>>>>>> getProbeDataAffy() that I have fixed in the devel version
>>>>>>>>> (and
>>>>>>>>> have no idea right now why I didn't push to the release as
>>>>>>>>> well...). The problem stems from the fact that you are
>>>>>>>>> reading in the whole probe_tab file, including the header.
>>>>>>>>> When the (x,y)
>>>>>>>>> coordinates and probe interrogation position data are coerced
>>>>>>>>> to integer, the first value for each is character, which is
>>>>>>>>> coerced to a NA.
>>>>>>>>>
>>>>>>>>> The release branch is no longer being built, so I cannot push
>>>>>>>>> a fix that will end up being available. The easiest thing for
>>>>>>>>> you to do is upgrade your R to 2.6.0 alpha and use the devel
>>>>>>>>> version of matchprobes.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Jim
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>               
>>>>>>>>>> I don't have this problem if I use old version(1.0.22).
>>>>>>>>>> Anyonne knows what cause this?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> best yan
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Bioconductor mailing list
>>>>>>>>>> Bioconductor at stat.math.ethz.ch
>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>>>>> Search the archives:
>>>>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.cond
>>>>>>>>>> ucto r
>>>>>>>>>>                 
>>>>>>>>>               
>>>>> -- 
>>>>> James W. MacDonald, M.S.
>>>>> Biostatistician
>>>>> Affymetrix and cDNA Microarray Core
>>>>> University of Michigan Cancer Center
>>>>> 1500 E. Medical Center Drive
>>>>> 7410 CCGC
>>>>> Ann Arbor MI 48109
>>>>> 734-647-5623
>>>>>
>>>>>
>>>>>
>>>>>       
>>>>     
>>> -- 
>>> James W. MacDonald, M.S.
>>> Biostatistician
>>> Affymetrix and cDNA Microarray Core
>>> University of Michigan Cancer Center
>>> 1500 E. Medical Center Drive
>>> 7410 CCGC
>>> Ann Arbor MI 48109
>>> 734-647-5623
>>>
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>>
>>>   
>>
>>  
>>
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
-------------------------------------------
Zhijin (Jean) Wu
Assistant Professor of Biostatistics
Brown University, Box G-S121
Providence, RI  02912

Tel: 401 863 1230
Fax: 401 863 9182
http://stat.brown.edu/~zwu



More information about the Bioconductor mailing list