[R] Sanity check in loading large dataframe

Duncan Murdoch murdoch@dunc@n @end|ng |rom gm@||@com
Thu Aug 5 15:40:51 CEST 2021


On 05/08/2021 9:16 a.m., Luigi Marongiu wrote:
 > Hello,
 > I am using a large spreadsheet (over 600 variables).
 > I tried `str` to check the dimensions of the spreadsheet and I got
 > ```
 >> (str(df))
 > 'data.frame': 302 obs. of  626 variables:
 >   $ record_id                 : int  1 1 1 1 1 1 1 1 1 1 ...
 > ....
 > $ v1_medicamento___aceta    : int  1 NA NA NA NA NA NA NA NA NA ...
 >    [list output truncated]
 > NULL
 > ```
 > I understand that `[list output truncated]` means that there are more
 > variables than those allowed by str to be displayed as rows. Thus I
 > increased the row's output with:
 > ```
 >
 >> (str(df, list.len=1000))
 > 'data.frame': 302 obs. of  626 variables:
 >   $ record_id                 : int  1 1 1 1 1 1 1 1 1 1 ...
 > ...
 > NULL
 > ```
 >
 > Does `NULL` mean that some of the variables are not closed? (perhaps a
 > missing comma somewhere)
 > Is there a way to check the sanity of the data and avoid that some
 > separator is not in the right place?
 > Thank you

The NULL is the value returned by str().  Normally it is not printed, 
but when you wrap str in parens as (str(df, list.len=1000)), that forces 
the value to print.

str() is unusual in R functions in that it prints to the console as it 
runs and returns nothing.  Many other functions construct a value which 
is only displayed if you print it, but something like

x <- str(df, list.len=1000)

will print the same as if there was no assignment, and then assign NULL 
to x.

Duncan Murdoch



More information about the R-help mailing list