[BioC] How to set up genes argument for makeTranscriptDb( ) function in GenomicFeatures?

Ying Chen Ying.Chen at imclone.com
Tue Oct 9 19:25:17 CEST 2012


Hi Marc,

Thanks a lot for the help.

I start with GAF because I look at TCGA CRC data.

Here was what I did initially:

> library(GenomicFeatures)
> library(GenomicRanges)
> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
> TCGA_CRC <- read.delim("TCGA_CRC_UNC_RNASeq_ExonID_detail_new.txt",stringsAsFactors=FALSE)
> ex <- exons(TxDb.Hsapiens.UCSC.hg19.knownGene, columns = c("exon_id","gene_id","tx_id"))
> TCGA_CRC <- GRanges(seqnames=TCGA_CRC$Chrom, ranges=IRanges(start=TCGA_CRC$Start,end=TCGA_CRC$End),strand=TCGA_CRC$Strand,name=TCGA_CRC$ExonID)
> elementMetadata(TCGA_CRC) <- elementMetadata(ex)[match(TCGA_CRC, ex),]
Error in elementMetadata(ex)[match(TCGA_CRC, ex), ] : 
  selecting rows: subscript contains NAs or out of bounds indices
> 
> x <- which(!TCGA_CRC %in% ex)
Returns a value larger than 3000.

Apparently  as Tim Triche pointed out about a month ago, the GAF made by gurus at UNC is different from UCSC.hg19.knownGene even most part are identical. With more and more TCGA sequencing datasets available, is there going to be some TxDbs for GAFs (for example https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/other/GAF/GAF_bundle/outputs/TCGA.Sept2010.09202010.gaf)?

Thanks a lot,

Ying

-----Original Message-----
From: bioconductor-bounces at r-project.org [mailto:bioconductor-bounces at r-project.org] On Behalf Of Marc Carlson
Sent: Tuesday, October 09, 2012 12:57 PM
To: bioconductor at r-project.org
Subject: Re: [BioC] How to set up genes argument for makeTranscriptDb( ) function in GenomicFeatures?

Hi Ying,

Since you are describing two transcripts, it should be the 1st one.  The second would for describing the exons that go with these transcripts.

But just out of curiousity, why are you starting with a GAF?  I see you using UCSC IDs, so why not just use one of these known gene packages?:

http://www.bioconductor.org/packages/release/BiocViews.html#___TranscriptDb


   Marc




On 10/09/2012 09:21 AM, Ying Chen wrote:
> Sorry, one more question regarding the transcripts argument:
>
> For the following two transcripts:
>
> 	uc001aaa.3	chr1:11874-12227,12613-12721,13221-14409:+
> 	uc010nxq.1	chr1:11874-12227,12595-12721,13403-14409:+
>
> What should be the format for tx_start and tx_end?
>
> transcripts<- data.frame(
> 		tx_id=c("uc001aaa.3"," uc010nxq.1"),
> 		tx_chrom="chr1",
> 		tx_strand="+",
> 		tx_start=c(11874, 11874),
> 		tx_end=c(14409, 14409))
>
> or:
>
> transcripts<- data.frame(
> 		tx_id=c("uc001aaa.3","uc001aaa.3","uc001aaa.3","uc010nxq.1","uc010nxq.1","uc010nxq.1"),
> 		tx_chrom="chr1",
> 		tx_strand="+",
> 		tx_start=c(11874,12613,13221,11874,12595,13403),
> 		tx_end=c(12227,12721,14409,12227,12721,14409))
>
> Thanks a lot for the help!
>
> Ying
>
> -----Original Message-----
> From: bioconductor-bounces at r-project.org 
> [mailto:bioconductor-bounces at r-project.org] On Behalf Of Ying Chen
> Sent: Tuesday, October 09, 2012 10:57 AM
> To: bioconductor at r-project.org
> Subject: [BioC] How to set up genes argument for makeTranscriptDb( ) function in GenomicFeatures?
>
> Hi guys,
>
> I am new to the GenomicsFeatures package and I am trying to make my own txdb out of TCGA GAF file using the function makeTranscriptDb(transcripts, splicings,genes) in GenomicsFeatures. The manual says that genes is a data frame containing the genes associated to a set of transcripts. I just wonder what format should it be. Will the following work?
>
>   genes<- data.frame(
>                     tx_id=c(txid1,txid2, txid3, txid4, txid5, txid6, txid7, txid8, txid9,............),                                                              # first 9 tx_ids for 3 gene_ids are listed
>                    gene_id=c(geneid1, geneid1, geneid2, geneid2, geneid2, geneid3, geneid3, geneid3, geneid3,.......))     # first 3 gene_ids are listed
>
> Thanks a lot for the help!
>
> Ying
>
> Confidentiality Note:\ This e-mail, and any attachment 
> t...{{dropped:19}}
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor

_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
Confidentiality Note:\ This e-mail, and any attachment t...{{dropped:11}}



More information about the Bioconductor mailing list