[BioC] SNP Analysis

Paul Leo p.leo at uq.edu.au
Sun Oct 25 23:48:42 CET 2009


Very nice!
Thanks for the heads up.
Cheers
Paul


Date: Fri, 23 Oct 2009 12:10:36 -0200

Regarding CNA, the 'crlmm' package does include a copy number tool:

http://www.bepress.com/jhubiostat/paper197/
http://www.bioconductor.org/packages/2.5/bioc/vignettes/crlmm/inst/doc/copynumber.pdf

b

On Oct 23, 2009, at 5:03 AM, Paul Leo wrote:

> I have to say R/Bioconductor is my favourite too , but PLINK is just
> great for GWAS for many reasons. You see PLINK , MACH (other  
> imputation
> algorithms), HALOVIEW and EIGENSTRAT can all be mixed and matched to
> some extent and provide a workhorse for QC, association and first stop
> visualisation. I typically use the output of these into R/ 
> Bioconductor.
> That said there are some neat Bioconductor tools that you can use  
> along
> side... I probably underutilise those myself , but I recommend highly
> those "common" tools.
>
> Plink binary format for genotype data is very handy. MACH-MACH2DAT can
> take covariates (at the end of the ped files). If is not clear what  
> you
> are doing but if you have 100 cases, what are your controls ? are you
> going to use a historical set ...WTCCC ?
>
> As for Birdseed and crlmm.To be honest I do not know if crlmm calls
> genotype AND copy number variations (like birdsuite, Birdseed is apart
> of that),  would be neat if it did. If you have expression data then
> genotypes + copy number data might be quite useful to you ....  
> depending
> on what you are studying.
>
> 100 cases is very small but I have seen success with as few as 200 for
> some specific genetic diseases
>
> I would consider
> 1) Affymetrics own SNP calling to get genotypes  + PennCNV( or  
> other) to
> get copy number variations, easy and most straight forward
> OR
> Buidsuite if you feel confident (never tried it, I use Illumina) cause
> then you get copy number variations straight away.
>
> 2) your cases + historical controls do a mini GWAS
> combine genotypes and QC with plink / eigenstrat for SAMPLES and SNPS
> ie:
>
> SAMPLES check:(stratification, related
> individuals,missingness,heterozygosity... look for outliers  )
>
> SNPS:(MAF , genotyping rate, HWE all need to be filtered on)
>
> 3) do straight up affy micro array analysis and check against your  
> mini
> GWAS
>
> 4) Really combine expression and genotypes; Try GGtools (Bioconductor)
> on you genotypes + expression data
> OR use plink or MACH2DAT with the expression data as covariates,  
> maybe.
> Plink has an excellent manual check that out.
>
> Note GGtools will not require that you do the mini GWAS as you will
> only need the cases genotypes I think see that package for details,  
> but
> still do filtering before you begin.
>
> my 2c worth of ideas....
>
> Cheers
> Paul
>
>
>
>
>
>
>
>
>
>
>
>
> -----Original Message-----
> From: Peter Ganske <peterganske at mac.com>
> To: Claus-Jürgen Scholz <scholz at klin-biochem.uni-wuerzburg.de>
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] SNP Analysis
> Date: Thu, 22 Oct 2009 14:51:52 +0200
>
> Dear Claus- Jürgen,
> thanks for the reply. In which way you would analyze the genotype
> frequency wit PLINK?
> And why you use this program instead of any bioconductor- package?
> All the best and thanks in advance
> Peter
>
>
>
> Am 22.10.2009 um 13:13 schrieb Claus-Jürgen Scholz:
>
>>
>> Dear Peter,
>>
>> indeed, Birdseed is a genotyping algorithm and I'd use it for  
>> genotype
>> calling of SNP6.0 arrays (best suited for this platform). If you have
>> the calls, export them into a table (export options and formats  
>> should
>> be described in the Genotyping Console manual) and analyze the
>> genotype
>> frequency differences between responders and non-responders (valuable
>> free software is e.g. PLINK). However, n=100 is a pretty small sample
>> size for a GWAS...
>>
>> Bests,
>> Claus-Jürgen
>>
>>
>> Peter Ganske schrieb:
>>> Dear Vincent,
>>> thanks for the fast replay. Well, i thought, that the Genotyping
>>> console used the Birdseed Algorithm and this algorithm is an
>>> Genotyping Algorithm.
>>>
>>> Its hard to find paper or groups, who worked with this array and for
>>> me ( i work as a student for an institue) is hard to find the right
>>> workflow without help (nobody worked here with SNP arrays in the
>>> past)
>>>
>>> So, i have 100 Arrays (100 CHP and  100 CEL files) of 100  
>>> patients. I
>>> want to have a look at the SNPs of the patients. 50 are non- 
>>> responder
>>> and 50 are responder. There should be a difference between the two
>>> groups. Since yet, i looked for any papers for getting an "general"
>>> workflow for sorting out most of the SNPs of the patients.
>>>
>>> So you think i have to try this package and create the genotyping
>>> calls?
>>> Whats about this workflow? So are my following thought right:
>>>
>>> - The package check every SNP for every Chips and put the result  
>>> in a
>>> table
>>> - i can combine the result of the SNPs with a selection of gene i
>>> want....
>>>
>>> My boss talked about a top-list of 50 genes... Maybe this can help  
>>> me
>>> out for the usage of CRLMM.. dont know
>>>
>>> Thanks a lot and sorry for the questions. First time for me to work
>>> with SNP Arrays and the first time to work with Bioconductor/R
>>> All the best from Germany
>>> Peter
>>> Am 21.10.2009 um 16:11 schrieb Vincent Carey:
>>>
>>>
>>>> Briefly, you can perform genotype calling with a confidence measure
>>>> using crlmm package, working from the CEL files.   The crlmm  
>>>> package
>>>> includes a vignette called crlmmDownstream.pdf that illustrates one
>>>> approach to GWAS analysis based on 6.0, using snpMatrix package.   
>>>> To
>>>> use crlmm you will also need a metadata package called
>>>> genomewidesnp6crlmm.
>>>>
>>>> There are certainly other approaches possible.  Our workflow
>>>> documentation for this use case probably needs some enhancement.
>>>>
>>>> On Wed, Oct 21, 2009 at 9:42 AM, Peter Ganske <Peter.Ganske at hki-jena.de
>>>>
>>>>> wrote:
>>>>>
>>>>> Hello,
>>>>> first time for me to work with SNP arrays. I got CEL- and CHP- 
>>>>> files
>>>>> for my Analysis. The CEL are from Affymetrix Human-Wide Genome  
>>>>> SNP-
>>>>> Array 6.0 and the CHP- files are dealed with the Birdseed-
>>>>> Algorithm (part of the Genotyp Console from Affymetrix as well).
>>>>> Is there anybody here, who worked with this arrays in the past? I
>>>>> am looking for an (general) workflow for my study. I want to
>>>>> analyse patients with Rheumatoid Arthritis with regard to SNPs and
>>>>> the question "why there are respoonder and non-responder for the
>>>>> therapy"?
>>>>> I am looking for an workflow for the arrays. Is it better to work
>>>>> with the CHP files or with the CEL- files?
>>>>> Would me great, if anybody can help me out.
>>>>> Thanks in advance
>>>>> Peter
>>>>>
>>>>>
>>>>> The information contained in this email and any attachments is
>>>>> confidential and may be subject to copyright or other intellectual
>>>>> property protection. If you are not the intended recipient, you  
>>>>> are
>>>>> not authorized to use or disclose this information, and we request
>>>>> that you notify us by reply mail or telephone and delete the
>>>>> original message from your mail system.
>>>>>      [[alternative HTML version deleted]]
>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at stat.math.ethz.ch
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>
>>>>>
>>>> The information contained in this email and any attachments is
>>>> confidential and may be subject to copyright or other intellectual
>>>> property protection. If you are not the intended recipient, you are
>>>> not authorized to use or disclose this information, and we request
>>>> that you notify us by reply mail or telephone and delete the
>>>> original message from your mail system.
>>>>
>>>> The information contained in this email and any attachments is
>>>> confidential and may be subject to copyright or other intellectual
>>>> property protection. If you are not the intended recipient, you are
>>>> not authorized to use or disclose this information, and we request
>>>> that you notify us by reply mail or telephone and delete the
>>>> original message from your mail system.
>>>>
>>>
>>>
>>>     [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> The information contained in this email and any attachments is
>> confidential and may be subject to copyright or other intellectual
>> property protection. If you are not the intended recipient, you are
>> not authorized to use or disclose this information, and we request
>> that you notify us by reply mail or telephone and delete the
>> original message from your mail system.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> --
> Dr Paul Leo
> Bioinformatician
> Diamantina Institute for Cancer, Immunology and Metabolic Medicine
> University of Queensland
> --------------------------------------------------------------------------------------
> Research Wing, Bldg 1
> Princess Alexandria Hospital
> Woolloongabba, QLD, 4102
> Tel: +61 7 3240 7740  Mob: 041 303 8691  Fax: +61 7 3240 5946
> Email: p.leo at uq.edu.au   Web: http://www.di.uq.edu.au
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
Dr Paul Leo
Bioinformatician
Diamantina Institute for Cancer, Immunology and Metabolic Medicine
University of Queensland
--------------------------------------------------------------------------------------
Research Wing, Bldg 1
Princess Alexandria Hospital 
Woolloongabba, QLD, 4102
Tel: +61 7 3240 7740  Mob: 041 303 8691  Fax: +61 7 3240 5946
Email: p.leo at uq.edu.au   Web: http://www.di.uq.edu.au



More information about the Bioconductor mailing list