[R] Importing fixed-width data

Dennis Murphy djmuser at gmail.com
Wed May 25 21:03:11 CEST 2011


I get a data frame on my end:

lines <- "2011-05-13 00:00:00 EONAAL330 dfa13002516PSCNONA
2011-05-13 00:00:01 EONAAL223 laa13044510AS.NONM
2011-05-13 00:00:05 EONBHS229 mia13001621NON"

df = read.fwf(textConnection(lines), widths=c(19,-4,7,3,8,2,1,3,1),
col.names=c("DateTime","Flight","Dest","ArrTime","MsgType","Conf","Runway","Source"),
colClasses=c("POSIXct",NA,"factor","factor","character","factor","factor","factor"))
> df
             DateTime  Flight Dest  ArrTime MsgType Conf Runway Source
1 2011-05-13 00:00:00 AAL330   dfa 13002516      PS    C    NON      A
2 2011-05-13 00:00:01 AAL223   laa 13044510      AS    .    NON      M
3 2011-05-13 00:00:05 BHS229   mia 13001621      NO    N   <NA>   <NA>
> str(df)
'data.frame':   3 obs. of  8 variables:
 $ DateTime: POSIXct, format: "2011-05-13 00:00:00" "2011-05-13 00:00:01" ...
 $ Flight  : Factor w/ 3 levels "AAL223 ","AAL330 ",..: 2 1 3
 $ Dest    : Factor w/ 3 levels "dfa","laa","mia": 1 2 3
 $ ArrTime : Factor w/ 3 levels "13001621","13002516",..: 2 3 1
 $ MsgType : chr  "PS" "AS" "NO"
 $ Conf    : Factor w/ 3 levels ".","C","N": 2 1 3
 $ Runway  : Factor w/ 1 level "NON": 1 1 NA
 $ Source  : Factor w/ 2 levels "A","M": 1 2 NA

> sessionInfo()
R version 2.13.0 Patched (2011-04-19 r55523)
Platform: x86_64-pc-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  grid      methods
[8] base

other attached packages:
 [1] gplots_2.8.0    caTools_1.12    bitops_1.0-4.1  gdata_2.8.2
 [5] gtools_2.6.2    sos_1.3-0       brew_1.0-6      lattice_0.19-26
 [9] ggplot2_0.8.9   proto_0.3-9.2   reshape_0.8.4   plyr_1.5.2

loaded via a namespace (and not attached):
[1] tools_2.13.0

Dennis

On Wed, May 25, 2011 at 8:42 AM, James Rome <jamesrome at gmail.com> wrote:
> I have a data set where the lines look like:
> 2011-05-13 00:00:00 EONAAL330 dfa13002516PSCNONA
> 2011-05-13 00:00:01 EONAAL223 laa13044510AS.NONM
> Some lines are missing the field before and after the NON:
> 2011-05-13 00:00:05 EONBHS229 mia13001621NON
>
> I read them into R using
>    df = read.fwf(file, widths=c(19,-4,7,3,8,2,1,3,1),
>
> col.names=c("DateTime","Flight","Dest","ArrTime","MsgType","Conf","Runway","Source"),
>
> colClasses=c("POSIXct",NA,"factor","factor","character","factor","factor","factor"))
>
> The documentation for read.fwf says that the data are read into a
> dataframe. Yet, I get a list, and the conversions I specified do not
> seem to have been obeyed:
>> df[1:20,]
>                         DateTime  Flight Dest  ArrTime MsgType Conf
> Runway Source
> 1  2011-05-13 00:00:00 AAL330   dfa 13002516      PS    C    NON      A
> 2  2011-05-13 00:00:01 AAL223   laa 13044510      AS    .    NON      M
> . . .
>> sapply(df, mode)
>   DateTime      Flight        Dest     ArrTime     MsgType        Conf
>  "numeric"   "numeric"   "numeric"   "numeric" "character"   "numeric"
>     Runway      Source
>  "numeric"   "numeric"
>> dfn = df[!is.na(df$Source),]
>> mode(df)
> [1] "list"
>
> What am I doing wrong?
>
> Thanks,
> Jim Rome
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list