[R] how to work with long vectors

Changbin Du changbind at gmail.com
Thu Nov 4 18:04:45 CET 2010


HI, Henrique,

Thanks for the great help!

I compared the output from your codes:
> te<-rev(100 * cumsum(matt$reads > 1) / length(matt$reads) )
> te
  [1] 100  99  98  97  96  95  94  93  92  91  90  89  88  87  86  85  84
83
 [19]  82  81  80  79  78  77  76  75  74  73  72  71  70  69  68  67  66
65
 [37]  64  63  62  61  60  59  58  57  56  55  54  53  52  51  50  49  48
47
 [55]  46  45  44  43  42  41  40  39  38  37  36  35  34  33  32  31  30
29
 [73]  28  27  26  25  24  23  22  21  20  19  18  17  16  15  14  13  12
11
 [91]  10   9   8   7   6   5   4   3   2   1

 the output from my code,
> result
  [1] 100  99  98  97  96  95  94  93  92  91  90  89  88  87  86  85  84
83
 [19]  82  81  80  79  79  77  77  77  74  73  72  71  70  70  68  67  67
65
 [37]  64  64  62  62  60  59  58  57  56  56  54  53  52  51  51  49  48
47
 [55]  46  45  45  43  42  41  40  39  38  37  36  35  34  33  32  31  30
29
 [73]  28  27  27  27  24  24  22  21  20  19  19  19  19  15  14  14  12
11
 [91]  10   9   8   7   7   5   4   3   2   1

There is no tie in your output. Look at the data set: There are ties in the
data set. Your codes work fast, but I think the results is not accurate.
Thanks so much for the great help!

> matt[c(1:35), ]
            id reads
1   Contig79:1     4
2   Contig79:2     8
;
;
22 Contig79:22    64
23 Contig79:23    64
24 Contig79:24    68
25 Contig79:25    68
26 Contig79:26    68

I also attached the testing file with this email. Thanks!



On Thu, Nov 4, 2010 at 9:12 AM, Henrique Dallazuanna <wwwhsd at gmail.com>wrote:

> Try this:
>
> rev(100 * cumsum(matt$reads > 1) / length(matt$reads) )
>
> On Thu, Nov 4, 2010 at 1:46 PM, Changbin Du <changbind at gmail.com> wrote:
>
>> HI, Dear R community,
>>
>> I have one data set like this,  What I want to do is to calculate the
>> cumulative coverage. The following codes works for small data set (#rows =
>> 100), but when feed the whole data set,  it still running after 24 hours.
>> Can someone give some suggestions for long vector?
>>
>> id                reads
>> Contig79:1    4
>> Contig79:2    8
>> Contig79:3    13
>> Contig79:4    14
>> Contig79:5    17
>> Contig79:6    20
>> Contig79:7    25
>> Contig79:8    27
>> Contig79:9    32
>> Contig79:10    33
>> Contig79:11    34
>>
>>
>> matt<-read.table("/house/groupdirs/genetic_analysis/mjblow/ILLUMINA_ONLY_MICROBIAL_GENOME_ASSEMBLY/4083340/STANDARD_LIBRARY/GWZW.994.5.1129.trim_69.fastq.19621832.sub.sorted.bam.clone.depth",
>> sep="\t", skip=0, header=F,fill=T) #
>> dim(matt)
>> [1] 3384766       2
>>
>> matt_plot<-function(matt, outputfile) {
>> names(matt)<-c("id","reads")
>>
>>  cover<-matt$reads
>>
>>
>> #calculate the cumulative coverage.
>> + cover_per<-function (data) {
>> + output<-numeric(0)
>> + for (i in data) {
>> +           x<-(100*sum(ifelse(data >= i, 1, 0))/length(data))
>> +           output<-c(output, x)
>> +                 }
>> + return(output)
>> + }
>>
>>
>>  result<-cover_per(cover)
>>
>>
>> Thanks so much!
>>
>>
>> --
>> Sincerely,
>> Changbin
>> --
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Henrique Dallazuanna
> Curitiba-Paraná-Brasil
> 25° 25' 40" S 49° 16' 22" O
>



-- 
Sincerely,
Changbin
--

Changbin Du
DOE Joint Genome Institute
Bldg 400 Rm 457
2800 Mitchell Dr
Walnut Creet, CA 94598
Phone: 925-927-2856
-------------- next part --------------
Contig79:1	4
Contig79:2	8
Contig79:3	13
Contig79:4	14
Contig79:5	17
Contig79:6	20
Contig79:7	25
Contig79:8	27
Contig79:9	32
Contig79:10	33
Contig79:11	34
Contig79:12	36
Contig79:13	39
Contig79:14	40
Contig79:15	44
Contig79:16	49
Contig79:17	55
Contig79:18	56
Contig79:19	59
Contig79:20	60
Contig79:21	62
Contig79:22	64
Contig79:23	64
Contig79:24	68
Contig79:25	68
Contig79:26	68
Contig79:27	70
Contig79:28	73
Contig79:29	76
Contig79:30	77
Contig79:31	78
Contig79:32	78
Contig79:33	79
Contig79:34	80
Contig79:35	80
Contig79:36	84
Contig79:37	87
Contig79:38	87
Contig79:39	88
Contig79:40	88
Contig79:41	89
Contig79:42	93
Contig79:43	94
Contig79:44	98
Contig79:45	99
Contig79:46	99
Contig79:47	102
Contig79:48	103
Contig79:49	108
Contig79:50	112
Contig79:51	112
Contig79:52	113
Contig79:53	116
Contig79:54	118
Contig79:55	120
Contig79:56	124
Contig79:57	124
Contig79:58	126
Contig79:59	128
Contig79:60	130
Contig79:61	133
Contig79:62	134
Contig79:63	136
Contig79:64	139
Contig79:65	144
Contig79:66	145
Contig79:67	146
Contig79:68	148
Contig79:69	149
Contig79:70	151
Contig79:71	156
Contig79:72	157
Contig79:73	158
Contig79:74	159
Contig79:75	159
Contig79:76	159
Contig79:77	160
Contig79:78	160
Contig79:79	161
Contig79:80	163
Contig79:81	164
Contig79:82	165
Contig79:83	165
Contig79:84	165
Contig79:85	165
Contig79:86	166
Contig79:87	170
Contig79:88	170
Contig79:89	172
Contig79:90	174
Contig79:91	178
Contig79:92	180
Contig79:93	181
Contig79:94	184
Contig79:95	184
Contig79:96	187
Contig79:97	190
Contig79:98	192
Contig79:99	194
Contig79:100	199


More information about the R-help mailing list