[BioC] DNAStringsSet - remove multiple entries with same name

deepti anand anand.deepti at outlook.com
Wed Sep 3 19:30:24 CEST 2014


Hi All,

I am trying to extract promoter sequences for a few ENTREZ IDS. The problem I am having is that there exists multiple transcripts for same gene. So this gives me multiple promoter sequences for same gene. Can I filter out the redundant  promoter sequences?

Here is my code:
ids.ok = c("67665" ,"13198" ,"110196","15368")## Obtain coordinates of transcript #####>grl <- transcriptsBy(TxDb.Mmusculus.UCSC.mm10.knownGene, by="gene") [ids.ok]>promoter.seqs <- getPromoterSeq(grl,Mmusculus, upstream=1500,downstream=0)>promoter.seqs<- unlist(promoter.seqs)> promoter.seqs  A DNAStringSet instance of length 8    width seq                                                                                                             names               [1]  1500 CTGCTGTAAAGTTACATTCCTGCCTAGAAATTTATATCGATTCTGCCGTCAGAA...GGAGGGAAGCGCCGGGCTGTGTCACGTGACGGGTGCGCCGGGCGTTGGCTCCTC 67665.67665[2]  1500 CTGCTGTAAAGTTACATTCCTGCCTAGAAATTTATATCGATTCTGCCGTCAGAA...GGAGGGAAGCGCCGGGCTGTGTCACGTGACGGGTGCGCCGGGCGTTGGCTCCTC 67665.67665[3]  1500 CTGCTGTAAAGTTACATTCCTGCCTAGAAATTTATATCGATTCTGCCGTCAGAA...GGAGGGAAGCGCCGGGCTGTGTCACGTGACGGGTGCGCCGGGCGTTGGCTCCTC 67665.67665[4]  1500 CTGCTGTAAAGTTACATTCCTGCCTAGAAATTTATATCGATTCTGCCGTCAGAA...GGAGGGAAGCGCCGGGCTGTGTCACGTGACGGGTGCGCCGGGCGTTGGCTCCTC 67665.67665[5]  1500 CAGCCCTAAAAGATGAAAGTCGCGACTTGCCCTGCCCCGCCCCAAAGGCTTCCC...CCCCCCCCCAGGAGGGGCCGGACAGCATAAAGGATACTCGCTCTCCGCTCTTGA 13198.13198[6]  1500 CACGTCGGCCTGCCTATCAGGGAGTCTACTGCCTTTTCCCTCAGTATGAGATAA...CCGTGGCATGCCGGGAGTCGTAGTTTTATATTTATGTTCTGCCTCCTGAGCCTG 110196.110196[7]  1500 CACGTCGGCCTGCCTATCAGGGAGTCTACTGCCTTTTCCCTCAGTATGAGATAA...CCGTGGCATGCCGGGAGTCGTAGTTTTATATTTATGTTCTGCCTCCTGAGCCTG 110196.110196[8]  1500 GTTAGTATTTAATATTTAAAGCTTGCTTCTAACTTGGCCCAAAATGTTGGAGTT...TGGGCGGCCACCACGTGACCCGCGTACTTAAAGGGCTGGCGCGGGCAGCTGCTC 15368.15368

In example above, there are four sequences for same gene '67665.67665'. How can I remove these entries?
I would appreciate any help
Dips



 		 	   		  
	[[alternative HTML version deleted]]



More information about the Bioconductor mailing list