[R] on how to make a skip-table

jim holtman jholtman at gmail.com
Thu Sep 12 22:21:10 CEST 2013


try this:

> record.length <- read.table(text = "    NR     length
+         1       100
+         2       130
+         3       150
+         4       148
+         5       100
+         6        83
+         7        60", header = TRUE)
> valida.records <- read.table(text = "  NR     factor
+         1       3
+         2       4
+         4       8
+         7       9", header = TRUE)
> x <- merge(record.length, valida.records, by = "NR", all.x = TRUE)
> x$seq <- cumsum(!is.na(x$factor))
>
> # need to add 1 to lines with NA to associate with next group
> x$seq[is.na(x$factor)] <- x$seq[is.na(x$factor)] + 1
>
> # split by 'seq', output last record and sum of preceeding records
> do.call(rbind
+     , lapply(split(x, x$seq), function(.sk){
+         if (nrow(.sk) > 1) .sk$skip <- sum(.sk$length[1:(nrow(.sk) - 1L)])
+         else .sk$skip <- 0
+         .sk[nrow(.sk), ] # return first value
+         })
+     )
  NR length factor seq skip
1  1    100      3   1    0
2  2    130      4   2    0
3  4    148      8   3  150
4  7     60      9   4  183
>

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Thu, Sep 12, 2013 at 1:17 PM, Zhang Weiwu <zhangweiwu at realss.com> wrote:
>
> I've got two data frames, as shown below:
> (NR means Number of Record)
>
>> record.lenths
>
>         NR     length
>         1       100
>         2       130
>         3       150
>         4       148
>         5       100
>         6        83
>         7        60
>
>> valida.records
>
>         NR     factor
>         1       3
>         2       4
>         4       8
>         7       9
>
> And I intend to obtain the following skip-table:
>
>> skip.table
>
>         NR     skip   factor
>         1       0       3
>         2       0       4
>         4       150     8
>         7       183     9
>
>
> The column 'skip' is the space needed to skip invalid records.
>
> For example, the 3rd element of skip.table has skip of '150', intended to
> skip the invalid record No.3 in record.lengths
>
> For example, the 4th element of skip.table has skip of '183', intended to
> skip the invalid record No.5 and No.6, together is 100+83.
>
> It's rather apparently intended for reading huge data files, and looks
> simple math, and I admit I couldn't find an R-ish way doing it.
>
> Thanks in advance and also thanks for pointing out if I had been on the
> right track to start with.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list