[BioC] processCGH in snapCGH package

Thu Sep 27 01:13:04 CEST 2007

jhs1jjm at leeds.ac.uk wrote:
> Quoting jhs1jjm at leeds.ac.uk on Wed 26 Sep 2007 22:54:01 BST:
>
>   
>> Quoting Sean Davis <sdavis2 at mail.nih.gov> on Wed 26 Sep 2007 17:30:18 BST:
>>
>>     
>>> jhs1jjm at leeds.ac.uk wrote:
>>>       
>>>> R 2.5.0 on openSUSE 10.2 x86_64.
>>>> Hi,
>>>>
>>>> I'm using the snapCGH package to analyse 2* 244k agilent CGH arrays with
>>>>         
>>> the aim
>>>       
>>>> of identifying regions of gain/loss.
>>>> So far i've done the following:
>>>>
>>>>         
>>>>> targets <- readTargets ("targets.txt")
>>>>> RG1 <-read.maimages (targets$File_names, source="agilent")
>>>>> RG2 <- readPositionalInfo (RG1,source="agilent")
>>>>> RG2$design <- c(-1-1)
>>>>> RG3 <- backgroundCorrect (RG2,method="minimum")
>>>>> MA1 <- normalizeWithinArrays (RG2,method="median")
>>>>>           
>>>> then
>>>>         
>>>>> MA2 <- processCGH(MA1,method.of.averaging=mean,ID="MA1$genes$ProbeName")
>>>>>           
>>>> Error in order(na.last, decreasing, ...) :
>>>>         argument 2 is not a vector
>>>>
>>>> I've looked at ?processCGH and am following the vignette for the snapCGH
>>>>         
>>> package
>>>       
>>>> fairly closely. Can anyone help with the error.
>>>>         
>>> You can't quote variable names like above.  I'm not sure that is going
>>> to fix the problem, but until the syntax is correct, it will be hard to
>>> diagnose the issue.
>>>
>>>       
>>>> Also i'm unsure of what background correction to use and normalization
>>>>         
>>> function
>>>       
>>>> (I've been informed that non-linear methods are unsuitable). Also if
>>>>         
>> anyone
>>     
>>> has
>>>       
>>>> any experience of Agilent CGH arrays could they also tell me whether the
>>>> default estimates used for the foreground and background intensities in
>>>> read.maimages are suitable. I'd like to determine the most suitable
>>>>         
>> methods
>>     
>>>> before as I think the segmentation may take some time on my machine. If
>>>>         
>> its
>>     
>>> a
>>>       
>>>> case of trial and error then then thats fine. Thanks for any input.
>>>>         
>>> I would use the LogRatio column of the Agilent file without any further
>>> normalization.  The LogRatio is already background corrected.  The CGH
>>> algorithms in snapCGH do not depend on the center of the data, so there
>>> isn't really a need to do any further median centering, etc.  In fact,
>>> there are probably better methods to center the data, but these use the
>>> segmented data.
>>>
>>> Hope that helps.
>>>
>>> Sean
>>>
>>>       
>> Hi Sean,
>>
>> I'm struggling to import the LogRatio column from the Agilent text files. I'm
>> using read.delim2 but this is bringing my machine to a standstill and after
>> 45
>> mins hadn't finished. Is the following the same:
>>
>>     
>>> RG1 <- read.maimages(targets$File_names,source="agilent")
>>> RG2 <- readPositionalInfo(RG1,"agilent")
>>> RG2$design <- c(1,-1)
>>> RG3 <- backgroundCorrect(RG2,method="none")
>>> MA1 <- normalizeWithinArrays (RG3,method="none")
>>> LogRatio <- MA1$M
>>>       
>> Having just looked at the text file it doesn't appear to be. I've looked
>> through
>> the data import R guide but haven't found anything yet.
>>
>>     

You will probably need to read the read.maimages help pretty carefully.  
You will need to specify other columns to read in if you want to read in 
the LogRatio column.  Alternatively, change the red and green foreground 
columns to be rProcessedSignal and gProcessedSignal and then do not do 
background correction, as LogRatio is calculated from these.  You will 
also potentially benefit from looking at the Agilent Feature Extraction 
Reference Manual, which explains the columns in the Agilent files.

http://www.chem.agilent.com/scripts/LiteraturePDF.asp?iWHID=50416

> Additionally Sean I tried:
>
>   
>> LogRatio <-log2(RG1$R)-log2(RG1$G)
>>     
>
> This gives me different results to the text file?
>   

The LogRatio column is calculated from rProcessedSignal and 
gProcessedSignal in the Agilent file.  These columns are not loaded by 
limma by default.

Hope that helps some.

Sean