[BioC] SNP Analysis

Paul Leo p.leo at uq.edu.au
Fri Oct 23 09:03:12 CEST 2009


I have to say R/Bioconductor is my favourite too , but PLINK is just
great for GWAS for many reasons. You see PLINK , MACH (other imputation
algorithms), HALOVIEW and EIGENSTRAT can all be mixed and matched to
some extent and provide a workhorse for QC, association and first stop
visualisation. I typically use the output of these into R/Bioconductor.
That said there are some neat Bioconductor tools that you can use along
side... I probably underutilise those myself , but I recommend highly
those "common" tools. 

Plink binary format for genotype data is very handy. MACH-MACH2DAT can
take covariates (at the end of the ped files). If is not clear what you
are doing but if you have 100 cases, what are your controls ? are you
going to use a historical set ...WTCCC ?

As for Birdseed and crlmm.To be honest I do not know if crlmm calls
genotype AND copy number variations (like birdsuite, Birdseed is apart
of that),  would be neat if it did. If you have expression data then
genotypes + copy number data might be quite useful to you .... depending
on what you are studying. 

100 cases is very small but I have seen success with as few as 200 for
some specific genetic diseases

I would consider 
1) Affymetrics own SNP calling to get genotypes  + PennCNV( or other) to
get copy number variations, easy and most straight forward
OR 
Buidsuite if you feel confident (never tried it, I use Illumina) cause
then you get copy number variations straight away.

2) your cases + historical controls do a mini GWAS 
combine genotypes and QC with plink / eigenstrat for SAMPLES and SNPS
ie:

SAMPLES check:(stratification, related
individuals,missingness,heterozygosity... look for outliers  ) 

SNPS:(MAF , genotyping rate, HWE all need to be filtered on)

3) do straight up affy micro array analysis and check against your mini
GWAS

4) Really combine expression and genotypes; Try GGtools (Bioconductor)
on you genotypes + expression data
OR use plink or MACH2DAT with the expression data as covariates, maybe.
Plink has an excellent manual check that out.

 Note GGtools will not require that you do the mini GWAS as you will
only need the cases genotypes I think see that package for details, but
still do filtering before you begin.

my 2c worth of ideas....

Cheers
Paul




 




 


-----Original Message-----
From: Peter Ganske <peterganske at mac.com>
To: Claus-Jürgen Scholz <scholz at klin-biochem.uni-wuerzburg.de>
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] SNP Analysis
Date: Thu, 22 Oct 2009 14:51:52 +0200

Dear Claus- Jürgen,
thanks for the reply. In which way you would analyze the genotype  
frequency wit PLINK?
And why you use this program instead of any bioconductor- package?
All the best and thanks in advance
Peter



Am 22.10.2009 um 13:13 schrieb Claus-Jürgen Scholz:

>
> Dear Peter,
>
> indeed, Birdseed is a genotyping algorithm and I'd use it for genotype
> calling of SNP6.0 arrays (best suited for this platform). If you have
> the calls, export them into a table (export options and formats should
> be described in the Genotyping Console manual) and analyze the  
> genotype
> frequency differences between responders and non-responders (valuable
> free software is e.g. PLINK). However, n=100 is a pretty small sample
> size for a GWAS...
>
> Bests,
> Claus-Jürgen
>
>
> Peter Ganske schrieb:
>> Dear Vincent,
>> thanks for the fast replay. Well, i thought, that the Genotyping
>> console used the Birdseed Algorithm and this algorithm is an
>> Genotyping Algorithm.
>>
>> Its hard to find paper or groups, who worked with this array and for
>> me ( i work as a student for an institue) is hard to find the right
>> workflow without help (nobody worked here with SNP arrays in the  
>> past)
>>
>> So, i have 100 Arrays (100 CHP and  100 CEL files) of 100 patients. I
>> want to have a look at the SNPs of the patients. 50 are non-responder
>> and 50 are responder. There should be a difference between the two
>> groups. Since yet, i looked for any papers for getting an "general"
>> workflow for sorting out most of the SNPs of the patients.
>>
>> So you think i have to try this package and create the genotyping  
>> calls?
>> Whats about this workflow? So are my following thought right:
>>
>> - The package check every SNP for every Chips and put the result in a
>> table
>> - i can combine the result of the SNPs with a selection of gene i
>> want....
>>
>> My boss talked about a top-list of 50 genes... Maybe this can help me
>> out for the usage of CRLMM.. dont know
>>
>> Thanks a lot and sorry for the questions. First time for me to work
>> with SNP Arrays and the first time to work with Bioconductor/R
>> All the best from Germany
>> Peter
>> Am 21.10.2009 um 16:11 schrieb Vincent Carey:
>>
>>
>>> Briefly, you can perform genotype calling with a confidence measure
>>> using crlmm package, working from the CEL files.   The crlmm package
>>> includes a vignette called crlmmDownstream.pdf that illustrates one
>>> approach to GWAS analysis based on 6.0, using snpMatrix package.  To
>>> use crlmm you will also need a metadata package called
>>> genomewidesnp6crlmm.
>>>
>>> There are certainly other approaches possible.  Our workflow
>>> documentation for this use case probably needs some enhancement.
>>>
>>> On Wed, Oct 21, 2009 at 9:42 AM, Peter Ganske <Peter.Ganske at hki-jena.de
>>>
>>>> wrote:
>>>>
>>>> Hello,
>>>> first time for me to work with SNP arrays. I got CEL- and CHP-files
>>>> for my Analysis. The CEL are from Affymetrix Human-Wide Genome SNP-
>>>> Array 6.0 and the CHP- files are dealed with the Birdseed-
>>>> Algorithm (part of the Genotyp Console from Affymetrix as well).
>>>> Is there anybody here, who worked with this arrays in the past? I
>>>> am looking for an (general) workflow for my study. I want to
>>>> analyse patients with Rheumatoid Arthritis with regard to SNPs and
>>>> the question "why there are respoonder and non-responder for the
>>>> therapy"?
>>>> I am looking for an workflow for the arrays. Is it better to work
>>>> with the CHP files or with the CEL- files?
>>>> Would me great, if anybody can help me out.
>>>> Thanks in advance
>>>> Peter
>>>>
>>>>
>>>> The information contained in this email and any attachments is
>>>> confidential and may be subject to copyright or other intellectual
>>>> property protection. If you are not the intended recipient, you are
>>>> not authorized to use or disclose this information, and we request
>>>> that you notify us by reply mail or telephone and delete the
>>>> original message from your mail system.
>>>>       [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>>
>>> The information contained in this email and any attachments is
>>> confidential and may be subject to copyright or other intellectual
>>> property protection. If you are not the intended recipient, you are
>>> not authorized to use or disclose this information, and we request
>>> that you notify us by reply mail or telephone and delete the
>>> original message from your mail system.
>>>
>>> The information contained in this email and any attachments is
>>> confidential and may be subject to copyright or other intellectual
>>> property protection. If you are not the intended recipient, you are
>>> not authorized to use or disclose this information, and we request
>>> that you notify us by reply mail or telephone and delete the
>>> original message from your mail system.
>>>
>>
>>
>> 	[[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> The information contained in this email and any attachments is  
> confidential and may be subject to copyright or other intellectual  
> property protection. If you are not the intended recipient, you are  
> not authorized to use or disclose this information, and we request  
> that you notify us by reply mail or telephone and delete the  
> original message from your mail system.

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
-- 
Dr Paul Leo
Bioinformatician
Diamantina Institute for Cancer, Immunology and Metabolic Medicine
University of Queensland
--------------------------------------------------------------------------------------
Research Wing, Bldg 1
Princess Alexandria Hospital 
Woolloongabba, QLD, 4102
Tel: +61 7 3240 7740  Mob: 041 303 8691  Fax: +61 7 3240 5946
Email: p.leo at uq.edu.au   Web: http://www.di.uq.edu.au



More information about the Bioconductor mailing list