[BioC] HapMap gene list
noxyport at gmail.com
noxyport at gmail.com
Wed Aug 4 22:49:56 CEST 2010
You are right! Sorry to bother you with this.
However, there is still something wrong. When I export the file again
(write.table) there are CDS and UTR included and when you run:
> hapmap=read.table("refGene_hg18_tests_11Apr07.gff", header=F, sep=" ")
> hapmap2=hapmap[which(hapmap$V3=="mRNA"), ]
V1 V2 V3 V4 V5 V6 V7 V8
2759 chr1 UCSC_1 mRNA 11840109 11841579 . - .
2759 ID=NM_002521;Alias=NPPB;Note=natriuretic peptide precursor B
preproprotein;summary=This gene is a member of the natriuretic peptide
family and encodes a secreted protein which functions as a cardiac
hormone. The protein undergoes two cleavage events%2C one within the
cell and a second after secretion into the blood. The proteins
biological actions include natriuresis%2C diuresis%2C
vasorelaxation%2C inhibition of renin and aldosterone secretion%2C and
a key role in cardiovascular homeostasis. A high concentration of this
protein in the bloodstream is indicative of heart failure. Mutations
in this gene have been associated with postmenopausal osteoporosis.
Publication Note: This RefSeq record includes a subset of the
publications that are available for this gene. Please see the Entrez
Gene record to access additional
hydroxylase precursor;summary=Lysyl hydroxylase is a membrane-bound
homodimeric protein localized to the cisternae of the endoplasmic
reticulum. The enzyme (cofactors iron and ascorbate) catalyzes the
hydroxylation of lysyl residues in collagen-like peptides. The
resultant hydroxylysyl groups are attachment sites for carbohydrates
... (shortend here)
I have no idea where R takes thes "\t.*" parts from but I think they
screw the whole dataframe somehow. Any suggestions?
On Wed, Aug 4, 2010 at 7:08 PM, Kasper Daniel Hansen
<kasperdanielhansen at gmail.com> wrote:
> On Wed, Aug 4, 2010 at 1:41 PM, noxyport at gmail.com <noxyport at gmail.com> wrote:
>> I have a problem with the gene list (gff version3 file) HapMap is
>> using (ftp://ftp.ncbi.nlm.nih.gov/hapmap/gbrowse/2009-02_phaseII+III/gff/refGene_hg18_tests_11Apr07.gff.gz).
>> I tried loading the file into R and selecting all "mRNA" entries but
>> something seems to go wrong with it:
>>> hapmap=read.table("refGene_hg18_tests_11Apr07.gff", header=F, sep=" ")
>>  171701
>>> hapmap2=hapmap[which(hapmap$V3=="mRNA"), ]
>>  12718
>>> hapmap[(2210:2220), (1:3)]
> Here, you want to use hapmap2 and not hapmap.
>> 2210 chr1 UCSC_1 mRNA
>> 2211 chr1 UCSC_1 five_prime_UTR
>> 2212 chr1 UCSC_1 five_prime_UTR
>> 2213 chr1 UCSC_1 CDS
>> 2214 chr1 UCSC_1 CDS
>> 2215 chr1 UCSC_1 CDS
>> 2216 chr1 UCSC_1 CDS
>> 2217 chr1 UCSC_1 CDS
>> 2218 chr1 UCSC_1 CDS
>> 2219 chr1 UCSC_1 CDS
>> 2220 chr1 UCSC_1 CDS
>> Can anyone explain why this could be? Probably, the large descriptive
>> column (V9) but I don't see the failure.
>> I have to admit that it is probably not the best way to use this file
>> but I do not find any other source (RefSeq, UCSC), which contains the
>> same genomic regions for the genes annotated as in HapMap. Which NCBI
>> 36 build did they use and where can I download a gene file with
>> chromosome, gene start and stop matching with HapMap?
>> Thanks for your help!
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor