[R] Read_fwf in package readr, double vs. numeric

Sarah Goslee @@r@h@go@|ee @end|ng |rom gm@||@com
Wed Apr 24 17:11:36 CEST 2019


Hi,

I can't reproduce your problem: with readr 1.1.1 on linux, it works as
expected. Letting read_fwf guess the types also works fine. (See
below.)

If you aren't running the current version of readr, update and retry.
If you are, then we probably need more info, at least sessionInfo().

Sarah



library(readr)
myFile <- "foo.txt"
pos <- fwf_positions(c(1,2,7), c(1,6,10))


type <- c('N','D','N')
types <- paste0(type, collapse = '')
types <- chartr('NCD', 'ncd', types)
read_fwf(file = myFile, col_positions = pos, col_types = types)

# A tibble: 3 x 3
     X1       X2    X3
  <dbl>    <dbl> <dbl>
1     1 1.00e-20  1043
2     1 7.12e+ 4  1043
3     1 9.12e+ 4  1055


type <- c('N','N','N')
types <- paste0(type, collapse = '')
types <- chartr('NCD', 'ncd', types)
read_fwf(file = myFile, col_positions = pos, col_types = types)

# A tibble: 3 x 3
     X1       X2    X3
  <dbl>    <dbl> <dbl>
1     1 1.00e-20  1043
2     1 7.12e+ 4  1043
3     1 9.12e+ 4  1055



> read_fwf(file = myFile, col_positions = pos, col_types = NULL)
Parsed with column specification:
cols(
  X1 = col_double(),
  X2 = col_double(),
  X3 = col_double()
)
# A tibble: 3 x 3
     X1       X2    X3
  <dbl>    <dbl> <dbl>
1     1 1.00e-20  1043
2     1 7.12e+ 4  1043
3     1 9.12e+ 4  1055




> sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Fedora 28 (Workstation Edition)

Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] readr_1.3.1    colorout_1.2-0

loaded via a namespace (and not attached):
 [1] compiler_3.5.3   assertthat_0.2.0 R6_2.4.0         cli_1.0.1
 [5] hms_0.4.2        tools_3.5.3      pillar_1.3.1     tibble_2.0.1
 [9] Rcpp_1.0.0       crayon_1.3.4     utf8_1.1.4       fansi_0.4.0
[13] pkgconfig_2.0.2  rlang_0.3.1


On Wed, Apr 24, 2019 at 10:56 AM Doran, Harold <HDoran using air.org> wrote:
>
> Suppose I have the following data sitting in a fwf file 'foo.txt'. The point of this email is to ask the group how to properly read in the value in this pseudo-data "1e-20" using the read_fwf function in the package readr.
>
> 11e-201043
> 1712201043
> 1912201055
>
> First, suppose I do it this way, where in this case "D" is used for double precision.
>
> library(readr)
> pos <- fwf_positions(c(1,2,7), c(1,6,10))
> type <- c('N','D','N')
> types <- paste0(type, collapse = '')
> types <- chartr('NCD', 'ncd', types)
>
> read_fwf(file = myFile, col_positions = pos, col_types = types)
>
> # A tibble: 3 x 3
>      X1       X2    X3
>   <dbl>    <dbl> <dbl>
> 1     1 1.00e-20  1043
> 2     1 7.12e+ 4  1043
> 3     1 9.12e+ 4  1055
>
> This seemingly works well and properly captures the value. However, if I instead were to indicate to the function that *all* of my columns were numeric (just insert this one line in lieu of the other above)
>
> type <- c('N','N','N')
>
> # A tibble: 3 x 3
>      X1    X2    X3
>   <dbl> <dbl> <dbl>
> 1     1     1  1043
> 2     1 71220  1043
> 3     1 91220  1055
>
> The read in is not correct. Here is the pragmatic issue. I have a legacy program that spits out the layout structure of the fwf file (start, end positions) and also indicates what the column types are. This layout file we receive always uses a column type of numeric (N) for any numeric types (including the column holding values such as 1e-20).
>
> This layout file will not change so I need to figure out how to solve the problem within my read in program. I suppose one option is that I could manually change any values of "N" to "D" in my R code. That seems to work. But not sure if that is the "right" way to solve this issue.
>
> Thanks
> Harold
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Sarah Goslee (she/her)
http://www.numberwright.com



More information about the R-help mailing list