[R] Read_fwf in package readr, double vs. numeric

Doran, Harold HDor@n @end|ng |rom @|r@org
Wed Apr 24 17:37:47 CEST 2019


Thank you, Sarah. Seems that updating to a newer version does indeed solve that problem. For completeness, below is the version in which it seems to work properly and below is the version in which I observe the problem I described.

> sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] readr_1.3.1

loaded via a namespace (and not attached):
 [1] compiler_3.5.3   assertthat_0.2.1 R6_2.4.0         cli_1.1.0        hms_0.4.2       
 [6] tools_3.5.3      pillar_1.3.1     tibble_2.1.1     Rcpp_1.0.1       crayon_1.3.4    
[11] utf8_1.1.4       fansi_0.4.0      pkgconfig_2.0.2  rlang_0.3.4     

> sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] readr_1.1.1

loaded via a namespace (and not attached):
 [1] compiler_3.4.2   assertthat_0.2.0 R6_2.2.2         cli_1.0.0        hms_0.3          tools_3.4.2     
 [7] pillar_1.3.0     tibble_1.4.2     Rcpp_1.0.0       crayon_1.3.4     utf8_1.1.4       fansi_0.2.3     
[13] rlang_0.3.0.1     

-----Original Message-----
From: Sarah Goslee <sarah.goslee using gmail.com> 
Sent: Wednesday, April 24, 2019 11:12 AM
To: Doran, Harold <HDoran using air.org>
Cc: r-help using r-project.org
Subject: Re: [R] Read_fwf in package readr, double vs. numeric

Hi,

I can't reproduce your problem: with readr 1.1.1 on linux, it works as expected. Letting read_fwf guess the types also works fine. (See
below.)

If you aren't running the current version of readr, update and retry.
If you are, then we probably need more info, at least sessionInfo().

Sarah



library(readr)
myFile <- "foo.txt"
pos <- fwf_positions(c(1,2,7), c(1,6,10))


type <- c('N','D','N')
types <- paste0(type, collapse = '')
types <- chartr('NCD', 'ncd', types)
read_fwf(file = myFile, col_positions = pos, col_types = types)

# A tibble: 3 x 3
     X1       X2    X3
  <dbl>    <dbl> <dbl>
1     1 1.00e-20  1043
2     1 7.12e+ 4  1043
3     1 9.12e+ 4  1055


type <- c('N','N','N')
types <- paste0(type, collapse = '')
types <- chartr('NCD', 'ncd', types)
read_fwf(file = myFile, col_positions = pos, col_types = types)

# A tibble: 3 x 3
     X1       X2    X3
  <dbl>    <dbl> <dbl>
1     1 1.00e-20  1043
2     1 7.12e+ 4  1043
3     1 9.12e+ 4  1055



> read_fwf(file = myFile, col_positions = pos, col_types = NULL)
Parsed with column specification:
cols(
  X1 = col_double(),
  X2 = col_double(),
  X3 = col_double()
)
# A tibble: 3 x 3
     X1       X2    X3
  <dbl>    <dbl> <dbl>
1     1 1.00e-20  1043
2     1 7.12e+ 4  1043
3     1 9.12e+ 4  1055




> sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-redhat-linux-gnu (64-bit) Running under: Fedora 28 (Workstation Edition)

Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] readr_1.3.1    colorout_1.2-0

loaded via a namespace (and not attached):
 [1] compiler_3.5.3   assertthat_0.2.0 R6_2.4.0         cli_1.0.1
 [5] hms_0.4.2        tools_3.5.3      pillar_1.3.1     tibble_2.0.1
 [9] Rcpp_1.0.0       crayon_1.3.4     utf8_1.1.4       fansi_0.4.0
[13] pkgconfig_2.0.2  rlang_0.3.1


On Wed, Apr 24, 2019 at 10:56 AM Doran, Harold <HDoran using air.org> wrote:
>
> Suppose I have the following data sitting in a fwf file 'foo.txt'. The point of this email is to ask the group how to properly read in the value in this pseudo-data "1e-20" using the read_fwf function in the package readr.
>
> 11e-201043
> 1712201043
> 1912201055
>
> First, suppose I do it this way, where in this case "D" is used for double precision.
>
> library(readr)
> pos <- fwf_positions(c(1,2,7), c(1,6,10)) type <- c('N','D','N') types 
> <- paste0(type, collapse = '') types <- chartr('NCD', 'ncd', types)
>
> read_fwf(file = myFile, col_positions = pos, col_types = types)
>
> # A tibble: 3 x 3
>      X1       X2    X3
>   <dbl>    <dbl> <dbl>
> 1     1 1.00e-20  1043
> 2     1 7.12e+ 4  1043
> 3     1 9.12e+ 4  1055
>
> This seemingly works well and properly captures the value. However, if 
> I instead were to indicate to the function that *all* of my columns 
> were numeric (just insert this one line in lieu of the other above)
>
> type <- c('N','N','N')
>
> # A tibble: 3 x 3
>      X1    X2    X3
>   <dbl> <dbl> <dbl>
> 1     1     1  1043
> 2     1 71220  1043
> 3     1 91220  1055
>
> The read in is not correct. Here is the pragmatic issue. I have a legacy program that spits out the layout structure of the fwf file (start, end positions) and also indicates what the column types are. This layout file we receive always uses a column type of numeric (N) for any numeric types (including the column holding values such as 1e-20).
>
> This layout file will not change so I need to figure out how to solve the problem within my read in program. I suppose one option is that I could manually change any values of "N" to "D" in my R code. That seems to work. But not sure if that is the "right" way to solve this issue.
>
> Thanks
> Harold
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Sarah Goslee (she/her)
http://www.numberwright.com



More information about the R-help mailing list