[BioC] Rsamtools question

Valerie Obenchain vobencha at fhcrc.org
Tue Oct 4 18:11:20 CEST 2011


On 10/04/2011 08:58 AM, Valerie Obenchain wrote:
> Hi Alpesh,
>
> One approach would be to
>
> I. Read in snps with readGappedAlignments :
>
> Convert your sam file to a bam using samtools from the command line,
>
>     samtools view -bS -o  yourfile.bam  yourfile.sam

or from within R -- Rsamtools::asBam

>
> and read  into R with readGappedAlignments
>
>     library(GenomicFeatrures)
>     ?readGappedAlignments
>
>
> II. Get gene ID :
>
> Use an org annotation package to get the gene id you are interested 
> in. There are many ID types available (Entrez, Ensembl, RefGene, 
> etc.). You will use this ID to identify the gene of interest in the 
> TxDb package (below).
> Many org packages are available, see all starting with "org",
>
>     http://bioconductor.org/packages/devel/data/annotation/
>
> For the human package,
>
>     library(org.Hs.eg.db)
>     ?org.Hs.eg.db
>
>
> III. Get the genomic coordinates for the gene of interest :
> The TxDb packages have genomic coordinates as well as transcript, exon 
> and gene ID's. See packages starting with "TxDb",
>
>     http://bioconductor.org/packages/devel/data/annotation/
>
> For example, the human hg19 known genes from UCSC are in the following 
> package. Since this is a UCSC package the geneID's will be RefGene ID's.
>
>     library(GenomicFeatures)
>     library(TxDb.Hsapiens.UCSC.hg19.knownGene)
>     ls(2)  # to see the name of the txdb object
>
> If you want to make your own txdb see
>
>     ?makeTranscriptDb
>
> There are several functions available for organizing you data by 
> exons, transcripts or gene, see
>
>     ?transcriptsBy
>
>
> IV. Counting
>
> You now have the genomic coordinates of your gene (from txdb package) 
> and your snps (read in with readGappedAlignments).
> Several counting options are available,
>
>     library(GenomicRanges)
>     ?summarizeOverlaps # avoids double counting, available in R-devel 
> version
>     ?countOverlaps
>     ?findOverlaps

also Rsamtools::applyPileups which gives bases over genomic coordinates

Valerie
>
>
> Valerie
>
>
>
> On 10/02/2011 08:28 AM, Alpesh Querer wrote:
>> Hi All,
>>
>> I am a newbie with R and Bioconductor. I have .sam files and I want 
>> to count
>> the number of reads mapping to particular SNP locations on the .fa gene.
>> Also I want to avoid double counting the SNPs which are covered by 
>> the same
>> 36bp read.
>>   Any help is highly appreciated.
>>
>> Thanks,
>> Alpesh
>>
>>     [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list