[R] what is the best way to process the following data?

William Dunlap wdunlap at tibco.com
Fri Jun 17 17:57:50 CEST 2016


You can make a step-number variable with cumsum(grepl("^Step ", ...)) and
use it as the splitting variable in split.  E.g.,

> dat <- read.table(yourFile, stringsAsFactors=FALSE, sep="|",
colClasses=c("NULL", "character", "character", "character"),
col.names=c("Junk","Date","Time","Type"))
> dat <- with(dat, data.frame(DateTime=as.POSIXct(paste(Date, Time),
format="%m/%d/%Y %H:%M:%S"), Type=Type, stringsAsFactors=FALSE))
> head(dat)
             DateTime           Type
1 2016-06-16 03:44:16       Step 001
2 2016-06-16 03:44:16 Initialization
3 2016-06-16 03:44:16        Filters
4 2016-06-16 03:45:03    Split Items
5 2016-06-16 03:46:20           Sort
6 2016-06-16 03:46:43          Check
> split(dat, cumsum(grepl("^Step ", dat$Type)))
$`1`
              DateTime                                        Type
1  2016-06-16 03:44:16                                    Step 001
2  2016-06-16 03:44:16                              Initialization
...
13 2016-06-16 04:06:33 BOP processing for 7,960 items has finished

$`2`
              DateTime                                        Type
14 2016-06-16 04:06:34                                    Step 002
15 2016-06-16 04:06:35                              Initialization
...



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Thu, Jun 16, 2016 at 8:42 PM, Satish Vadlamani <
satish.vadlamani at gmail.com> wrote:

> Hello,
> I have multiple text files with the format shown below (see the two files
> that I pasted below). Each file is a log of multiple steps that the system
> has processed and for each step, it has shown the start time of the process
> step. For example, in the data below, the filter started at
> |06/16/2016|03:44:16
>
> How to read this data so that Step 001 is one data frame, Step 002 is
> another, and so on. After I do this, I will then compare the Step 001 times
> with and without parallel process.
>
> For example, the files pasted below "no_parallel_process_SLS_4.txt" and
> "parallel_process_SLS_4.txt" will make it clear what I am trying to do. I
> want to compare the parallel process times taken for each step with the non
> parallel process times.
>
> If there are better ways of performing this task that what I am thinking,
> could you let me know? Thanks in advance.
>
> Satish Vadlamani
>
> >> parallel_process_file.txt
>
> |06/16/2016|03:44:16|Step 001
> |06/16/2016|03:44:16|Initialization
> |06/16/2016|03:44:16|Filters
> |06/16/2016|03:45:03|Split Items
> |06/16/2016|03:46:20|Sort
> |06/16/2016|03:46:43|Check
> |06/16/2016|04:01:13|Save
> |06/16/2016|04:04:35|Update preparation
> |06/16/2016|04:04:36|Update comparison
> |06/16/2016|04:04:38|Update
> |06/16/2016|04:04:38|Update
> |06/16/2016|04:06:01|Close
> |06/16/2016|04:06:33|BOP processing for 7,960 items has finished
> |06/16/2016|04:06:34|Step 002
> |06/16/2016|04:06:35|Initialization
> |06/16/2016|04:06:35|Filters
> |06/16/2016|04:07:14|Split Items
> |06/16/2016|04:08:57|Sort
> |06/16/2016|04:09:06|Check
> |06/16/2016|04:26:36|Save
> |06/16/2016|04:39:29|Update preparation
> |06/16/2016|04:39:31|Update comparison
> |06/16/2016|04:39:43|Update
> |06/16/2016|04:39:45|Update
> |06/16/2016|04:44:28|Close
> |06/16/2016|04:45:26|BOP processing for 8,420 items has finished
> |06/16/2016|04:45:27|Step 003
> |06/16/2016|04:45:27|Initialization
> |06/16/2016|04:45:27|Filters
> |06/16/2016|04:48:50|Split Items
> |06/16/2016|04:55:15|Sort
> |06/16/2016|04:55:40|Check
> |06/16/2016|05:13:35|Save
> |06/16/2016|05:17:34|Update preparation
> |06/16/2016|05:17:34|Update comparison
> |06/16/2016|05:17:36|Update
> |06/16/2016|05:17:36|Update
> |06/16/2016|05:19:29|Close
> |06/16/2016|05:19:49|BOP processing for 8,876 items has finished
> |06/16/2016|05:19:50|Step 004
> |06/16/2016|05:19:50|Initialization
> |06/16/2016|05:19:50|Filters
> |06/16/2016|05:20:43|Split Items
> |06/16/2016|05:22:14|Sort
> |06/16/2016|05:22:29|Check
> |06/16/2016|05:37:27|Save
> |06/16/2016|05:38:43|Update preparation
> |06/16/2016|05:38:44|Update comparison
> |06/16/2016|05:38:45|Update
> |06/16/2016|05:38:45|Update
> |06/16/2016|05:39:09|Close
> |06/16/2016|05:39:19|BOP processing for 5,391 items has finished
> |06/16/2016|05:39:20|Step 005
> |06/16/2016|05:39:20|Initialization
> |06/16/2016|05:39:20|Filters
> |06/16/2016|05:39:57|Split Items
> |06/16/2016|05:40:21|Sort
> |06/16/2016|05:40:24|Check
> |06/16/2016|05:46:01|Save
> |06/16/2016|05:46:54|Update preparation
> |06/16/2016|05:46:54|Update comparison
> |06/16/2016|05:46:54|Update
> |06/16/2016|05:46:55|Update
> |06/16/2016|05:47:24|Close
> |06/16/2016|05:47:31|BOP processing for 3,016 items has finished
> |06/16/2016|05:47:32|Step 006
> |06/16/2016|05:47:32|Initialization
> |06/16/2016|05:47:32|Filters
> |06/16/2016|05:47:32|Update preparation
> |06/16/2016|05:47:32|Update comparison
> |06/16/2016|05:47:32|Update
> |06/16/2016|05:47:32|Close
> |06/16/2016|05:47:33|BOP processing for 0 items has finished
> |06/16/2016|05:47:33|Step 007
> |06/16/2016|05:47:33|Initialization
> |06/16/2016|05:47:33|Filters
> |06/16/2016|05:47:34|Split Items
> |06/16/2016|05:47:34|Sort
> |06/16/2016|05:47:34|Check
> |06/16/2016|05:47:37|Save
> |06/16/2016|05:47:37|Update preparation
> |06/16/2016|05:47:37|Update comparison
> |06/16/2016|05:47:37|Update
> |06/16/2016|05:47:37|Update
> |06/16/2016|05:47:37|Close
> |06/16/2016|05:47:37|BOP processing for 9 items has finished
> |06/16/2016|05:47:37|Step 008
> |06/16/2016|05:47:37|Initialization
> |06/16/2016|05:47:37|Filters
> |06/16/2016|05:47:38|Update preparation
> |06/16/2016|05:47:38|Update comparison
> |06/16/2016|05:47:38|Update
> |06/16/2016|05:47:38|Close
> |06/16/2016|05:47:38|BOP processing for 0 items has finished
>
>
>
>
> >> no_parallel_process_file.txt
>
> |06/15/2016|22:52:46|Step 001
> |06/15/2016|22:52:46|Initialization
>
> |06/15/2016|22:52:46|Filters
>
> |06/15/2016|22:54:21|Split Items
>
> |06/15/2016|22:55:10|Sort
>
> |06/15/2016|22:55:15|Check
>
> |06/15/2016|23:04:43|Save
>
> |06/15/2016|23:06:38|Update preparation
>
> |06/15/2016|23:06:38|Update comparison
>
> |06/15/2016|23:06:39|Update
>
> |06/15/2016|23:06:39|Update
>
> |06/15/2016|23:12:04|Close
>
> |06/15/2016|23:13:16|BOP processing for 7,942 items has finished
>
> |06/15/2016|23:13:17|Step 002
> |06/15/2016|23:13:17|Initialization
>
> |06/15/2016|23:13:17|Filters
>
> |06/15/2016|23:16:27|Split Items
>
> |06/15/2016|23:20:18|Sort
>
> |06/15/2016|23:20:34|Check
>
> |06/16/2016|00:08:08|Save
>
> |06/16/2016|00:26:19|Update preparation
>
> |06/16/2016|00:26:20|Update comparison
>
> |06/16/2016|00:26:30|Update
>
> |06/16/2016|00:26:31|Update
>
> |06/16/2016|00:42:31|Close
>
> |06/16/2016|00:45:09|BOP processing for 8,400 items has finished
>
> |06/16/2016|00:45:11|Step 003
> |06/16/2016|00:45:12|Initialization
>
> |06/16/2016|00:45:12|Filters
>
> |06/16/2016|00:53:01|Split Items
>
> |06/16/2016|01:01:44|Sort
>
> |06/16/2016|01:02:55|Check
>
> |06/16/2016|01:41:40|Save
>
> |06/16/2016|01:44:37|Update preparation
>
> |06/16/2016|01:44:37|Update comparison
>
> |06/16/2016|01:44:39|Update
>
> |06/16/2016|01:44:39|Update
>
> |06/16/2016|01:47:37|Close
>
> |06/16/2016|01:48:07|BOP processing for 8,867 items has finished
>
> |06/16/2016|01:48:08|Step 004
> |06/16/2016|01:48:08|Initialization
>
> |06/16/2016|01:48:08|Filters
>
> |06/16/2016|01:49:51|Split Items
>
> |06/16/2016|01:50:35|Sort
>
> |06/16/2016|01:50:39|Check
>
> |06/16/2016|01:59:12|Save
>
> |06/16/2016|02:00:47|Update preparation
>
> |06/16/2016|02:00:47|Update comparison
>
> |06/16/2016|02:00:48|Update
>
> |06/16/2016|02:00:48|Update
>
> |06/16/2016|02:02:40|Close
>
> |06/16/2016|02:02:55|BOP processing for 5,383 items has finished
>
> |06/16/2016|02:02:56|Step 005
> |06/16/2016|02:02:56|Initialization
>
> |06/16/2016|02:02:56|Filters
>
> |06/16/2016|02:03:47|Split Items
>
> |06/16/2016|02:04:19|Sort
>
> |06/16/2016|02:04:21|Check
>
> |06/16/2016|02:08:08|Save
>
> |06/16/2016|02:09:22|Update preparation
>
> |06/16/2016|02:09:22|Update comparison
>
> |06/16/2016|02:09:22|Update
>
> |06/16/2016|02:09:22|Update
>
> |06/16/2016|02:11:03|Close
>
> |06/16/2016|02:11:14|BOP processing for 3,016 items has finished
>
> |06/16/2016|02:11:14|Step 006
> |06/16/2016|02:11:14|Initialization
>
> |06/16/2016|02:11:14|Filters
>
> |06/16/2016|02:11:15|Update preparation
>
> |06/16/2016|02:11:15|Update comparison
>
> |06/16/2016|02:11:15|Update
>
> |06/16/2016|02:11:15|Close
>
> |06/16/2016|02:11:15|BOP processing for 0 items has finished
>
> |06/16/2016|02:11:15|Step 007
> |06/16/2016|02:11:15|Initialization
>
> |06/16/2016|02:11:15|Filters
>
> |06/16/2016|02:11:17|Split Items
>
> |06/16/2016|02:11:17|Sort
>
> |06/16/2016|02:11:17|Check
>
> |06/16/2016|02:11:20|Save
>
> |06/16/2016|02:11:20|Update preparation
>
> |06/16/2016|02:11:20|Update comparison
>
> |06/16/2016|02:11:20|Update
>
> |06/16/2016|02:11:20|Update
>
> |06/16/2016|02:11:20|Close
>
> |06/16/2016|02:11:20|BOP processing for 9 items has finished
>
> |06/16/2016|02:11:20|Step 008
> |06/16/2016|02:11:20|Initialization
>
> |06/16/2016|02:11:21|Filters
>
> |06/16/2016|02:11:21|Update preparation
>
> |06/16/2016|02:11:21|Update comparison
>
> |06/16/2016|02:11:21|Update
>
> |06/16/2016|02:11:21|Close
>
> |06/16/2016|02:11:21|BOP processing for 0 items has finished
>
>
>
> --
>
> Satish Vadlamani
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list