[BioC] Can I use EdgeR to analyse Bulk segregant studies based on markers generated using Illumina generated tag counts

Mon Sep 26 10:18:02 CEST 2011

Dear Josquin

I think that (in principle) edgeR can be applied to such types of data.

If you have many segregants, you may not need the 'moderation' aspect of 
edgeR, and you could simply use edgeR::exactTest with per-locus 
estimates of the 'dispersion' parameter.

	Best wishes
	Wolfgang

Sep/22/11 6:02 AM, josquin.tibbits at dpi.vic.gov.au scripsit::
> Hi List,
>
> In the edgeR manual it is mentioned that the package may be useful in the
> analysis of other count based studies and I am wondering if the bulk
> segregation studies (based on tag counts) are such a case i.e. hoping for
> advice on the potential rightness or wrongness of doing this...
>
> The general experimental outline is as follows:
>
>> From parents introgressed with DNA containing allelels that are +/- for a
> trait  a set of offspring is generated that segregate for the trait (in my
> case the progeny have been back-crossed to one of the parental lines for
> multiple generations so that they are effectively isogenic backgrounds
> with just the QTL region segregating,  while the parents can have multiple
> regions segregating from the original introgression including the QTL
> region). Offspring are bulked to create pools representing + and - bulks.
> DNA is then isolated from parents and offspring and these are assayed for
> a marker set which is generated as follows:
>
> DNA digested with a restriction enzyme, adaptor ligated to these ends (ie
> tag the restriction digestion sites). Randomly shear DNA and then ligate
> second adaptor. Enrich for fragments with two adaptors and then sequence
> fragments on Illumina HiSeq instrument.  Map tags back to a reference and
> count hits.  We are then looking for reference sequences that are
> differentially hit in the + vs - strains in both parents and offspring.
>
> My thinking was that parents and offspring are effectively replications of
> the presence or absence of the QTL region and the way the counts are
> obtained has similar sampling properties to when you are doing a DE study
> of gene expression. Hence I thought to use edgeR to do the analysis as it
> can normalise for both overall count numbers and the TMM normalisation for
> target tag complexity and the tagwise implementation can account for the
> technical variance arising from bulking, sampling, mapping efficieny  etc.
>
>
> Looking forward to reading your comments.....
>
> Best, Josquin
> Notice:
> This email and any attachments may contain information...{{dropped:17}}