[BioC] [Bioc-sig-seq] extract id from ShortRead
Sean Davis
seandavi at gmail.com
Mon Nov 30 16:29:03 CET 2009
On Mon, Nov 30, 2009 at 10:20 AM, Ramzi TEMANNI <ramzi.temanni at gmail.com> wrote:
> Hi Sean,
> Thanks for your help as.character give 2 ids by row:
> [1] "HWI-EA332_8_1_3_659#GGGGNN/1" "HWI-EA332_8_1_3_1738#CCCCNN/1"
> [3] "HWI-EA332_8_1_3_1094#AGGANN/1" "HWI-EA332_8_1_3_558#TTTCNN/1"
> [5] "HWI-EA332_8_1_3_1920#AAAANN/1" "HWI-EA332_8_1_3_228#GGGGNN/1"
> it should be some accessors to extract one Id by row.
> i've take a look at suggested help but there's no useful info to extract
> what I want
This is NOT two ids per row. It is one vector. R outputs two
elements per row only because your screen is wide enough for that.
If you do:
tmp <- as.character(id(aln))
class(tmp)
tmp[1]
tmp[2]
tmp[1:5]
length(tmp)
it might give you more of an idea what is going on.
Sean
>
> On Mon, Nov 30, 2009 at 3:40 PM, Sean Davis <seandavi at gmail.com> wrote:
>>
>> On Mon, Nov 30, 2009 at 9:27 AM, Ramzi TEMANNI <ramzi.temanni at gmail.com>
>> wrote:
>> > Hi,
>> > I have a sequence loaded from bowtie alignment
>> > aln <- readAligned("./S1", pattern="S1_1.hg19.bowtie.align",
>> > type="Bowtie")
>> > I would like to to extract the id to select specific reads
>> > I run id(aln) and I get:
>> > id(aln)
>> > A BStringSet instance of length 4340867
>> > width seq
>> > [1] 28 HWI-EA332_8_1_3_659#GGGGNN/1
>> > [2] 29 HWI-EA332_8_1_3_1738#CCCCNN/1
>> > [3] 29 HWI-EA332_8_1_3_1094#AGGANN/1
>> > [4] 28 HWI-EA332_8_1_3_558#TTTCNN/1
>> > [5] 29 HWI-EA332_8_1_3_1920#AAAANN/1
>> > [6] 28 HWI-EA332_8_1_3_228#GGGGNN/1
>> > [7] 29 HWI-EA332_8_1_3_1261#AGGGNN/1
>> > [8] 28 HWI-EA332_8_1_3_908#ACTTNN/1
>> > [9] 27 HWI-EA332_8_1_3_53#CTGCNN/1
>> > ... ... ...
>> > [4340859] 33 HWI-EA332_8_120_1596_499#TTGANA/1
>> > [4340860] 34 HWI-EA332_8_120_1599_1161#CCACNT/1
>> > [4340861] 33 HWI-EA332_8_120_1601_255#CTCTNA/1
>> > [4340862] 33 HWI-EA332_8_120_1601_504#CCATNC/1
>> > [4340863] 33 HWI-EA332_8_120_1624_899#CTCTNT/1
>> > [4340864] 33 HWI-EA332_8_120_1487_658#ACCCNA/1
>> > [4340865] 32 HWI-EA332_8_120_1533_28#CACANG/1
>> > [4340866] 33 HWI-EA332_8_120_1564_807#CCCGNG/1
>> > [4340867] 34 HWI-EA332_8_120_1474_1350#CCTGNC/1
>> >
>> > This BStringSet instance has 'width' and 'seq'
>> > runing str(id(aln)) i got this
>> >
>> > Formal class 'BStringSet' [package "Biostrings"] with 5 slots
>> > ..@ pool :Formal class 'SharedRaw_Pool' [package "IRanges"]
>> > with
>> > 2 slots
>> > .. .. ..@ xp_list :List of 1
>> > .. .. .. ..$ :<externalptr>
>> > .. .. ..@ .link_to_cached_object_list:List of 1
>> > .. .. .. ..$ :<environment: 0x2af6400>
>> > ..@ ranges :Formal class 'GroupedIRanges' [package "IRanges"]
>> > with
>> > 7 slots
>> > .. .. ..@ group : int [1:4340867] 1 1 1 1 1 1 1 1 1 1 ...
>> > .. .. ..@ start : int [1:4340867] 1 29 58 87 115 144 172 201
>> > 229
>> > 256 ...
>> > .. .. ..@ width : int [1:4340867] 28 29 29 28 29 28 29 28 27
>> > 29
>> > ...
>> > .. .. ..@ NAMES : NULL
>> > .. .. ..@ elementMetadata: NULL
>> > .. .. ..@ elementType : chr "integer"
>> > .. .. ..@ metadata : list()
>> > ..@ elementMetadata: NULL
>> > ..@ elementType : chr "BString"
>> > ..@ metadata : list()
>> >
>> > But i'm wondering how to extract only the 'seq' from all that and store
>> > result in a table ?
>>
>> as.character(id(aln))
>>
>> will return a character vector of the names. You might want to look
>> at the help for AlignedRead-class and BStringSet-class for some help
>> in understanding these classes and what can be done with them. It may
>> be that you will not need to go to character vector to do what you
>> want with the reads.
>>
>> Sean
>
>
More information about the Bioconductor
mailing list