[BioC] HapMap gene list

Kasper Daniel Hansen kasperdanielhansen at gmail.com
Wed Aug 4 20:08:41 CEST 2010


On Wed, Aug 4, 2010 at 1:41 PM, noxyport at gmail.com <noxyport at gmail.com> wrote:
> Hi,
>
> I have a problem with the gene list (gff version3 file) HapMap is
> using (ftp://ftp.ncbi.nlm.nih.gov/hapmap/gbrowse/2009-02_phaseII+III/gff/refGene_hg18_tests_11Apr07.gff.gz).
> I tried loading the file into R and selecting all "mRNA" entries but
> something seems to go wrong with it:
>
>> hapmap=read.table("refGene_hg18_tests_11Apr07.gff", header=F, sep="    ")
>> nrow(hapmap)
> [1] 171701
>> hapmap2=hapmap[which(hapmap$V3=="mRNA"), ]
>> nrow(hapmap2)
> [1] 12718
>> hapmap[(2210:2220), (1:3)]

Here, you want to use hapmap2 and not hapmap.

Kasper


> 2210 chr1 UCSC_1           mRNA
> 2211 chr1 UCSC_1 five_prime_UTR
> 2212 chr1 UCSC_1 five_prime_UTR
> 2213 chr1 UCSC_1            CDS
> 2214 chr1 UCSC_1            CDS
> 2215 chr1 UCSC_1            CDS
> 2216 chr1 UCSC_1            CDS
> 2217 chr1 UCSC_1            CDS
> 2218 chr1 UCSC_1            CDS
> 2219 chr1 UCSC_1            CDS
> 2220 chr1 UCSC_1            CDS
>>
>
> Can anyone explain why this could be? Probably, the large descriptive
> column (V9) but I don't see the failure.
>
> I have to admit that it is probably not the best way to use this file
> but I do not find any other source (RefSeq, UCSC), which contains the
> same genomic regions for the genes annotated as in HapMap. Which NCBI
> 36 build did they use and where can I download a gene file with
> chromosome, gene start and stop matching with HapMap?
>
> Thanks for your help!
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list