[R] RE: error (fwd)

Cere M. Davis cere at u.washington.edu
Thu Feb 5 00:45:10 CET 2004


Hi folks,

I've got this funny problem with R's foreign library when reading stata
files.  One file consistently produces vector out of memory errors after
gobbling up 2.7G of memory.  I parsed through the read.dta function and
figured out where the error occurs and the description is below.  I am
running R-1.8.1 on Debian stable system glibc2.2 kernel 2.4.24.  R is is
compiled from source as a shared library.  The file that I am reading is
only 172M in size.  The system I am using has 4G of free memory and 8 G of
swap so this doesn't seem to be a problem for lack of free memory.  See
Below.

Thanks.
-----------------------------------------------------------------------

I stepped through the
function and found that everything runs fine but I get a bunch of warnings
duing the convert.factors section of the code like:

> warnings()
Warning messages:
1: Value labels (fafdstmp) for afdstmp are missing
2: Value labels (fafsmon) for afsmon are missing
3: Value labels (fafsnum) for afsnum are missing
4: Value labels (fafsval) for afsval are missing
5: Value labels (fahcmcar) for ahcmcare are missing
6: Value labels (fahengyv) for ahengyv are missing
7: Value labels (fahenrgy) for ahenrgy are missing
8: Value labels (fahflnch) for ahflnch are missing
9: Value labels (fahflnno) for ahflnno are missing
10: Value labels (fahhcvhi) for ahhcvhi are missing
11: Value labels (fahhhino) for ahhhino are missing
12: Value labels (fahhnum) for ahhnum are missing
13: Value labels (fahmcnum) for ahmcnum are missing
14: Value labels (fahncvhi) for ahncvhi are missing

etc.


then when I try and return rval as the last line in the function and this
is where R starts gobbling up a tone of memory and eventualy dies with a
vector memory exhausted error.

Do you have a sense of where this could be coming from?  Must be something
funny about the communication between the foreign library and the main R
lib.

I'll email the R folks.

On Wed, 4 Feb 2004, Mark S. Handcock wrote:

> Date: Wed, 4 Feb 2004 14:38:12 -0800
> From: Mark S. Handcock <handcock at stat.washington.edu>
> To: 'Cere M. Davis' <cere at u.washington.edu>,
>      'R. Anderson' <anders10 at u.washington.edu>
> Cc: morrism at u.washington.edu, 'Matthew B Weatherford' <mbw at u.washington.edu>,
>      Msh <handcock at stat.washington.edu>
> Subject: RE: error
>
> Cere,
>
> This is useful information. How large is the original data file? If it is
> small (<1Gb) then the 2.7Gb is excessive. Have you searched the R users
> group on www.r-project.org?
>
> Also, can you try:
>
>  rval <- .External("do_readStata", "file", PACKAGE = "foreign")
>
> where "file" is the stata file name on both machines. This is the internal R
> read using  C, so if that works it is elsewhere in the "read.dta" function
> which is easy to fix.
>
> Mark
>
> > -----Original Message-----
> > From: Cere M. Davis [mailto:cere at u.washington.edu]
> > Sent: Monday, February 02, 2004 10:45 PM
> > To: R. Anderson
> > Cc: morrism at u.washington.edu; handcock at stat.washington.edu;
> > Matthew B Weatherford
> > Subject: Re: error
> >
> >
> > More info on the R memory problem.  Just reading one dta file
> > in via the
> > foreign library requires upwards of 2.7G of memory on any
> > machine, 2.7G is
> > the point at which the process runs out of memory so I can't know the
> > upper limit of this process.  I am running the R read process
> > on Libra now
> > but it's been 5 hours since I started the read request and
> > the disk swap
> > is so busy that I cannot tell when the process will finish.
> > There does
> > appear to be a problem with this R job using system swap
> > space on Mosix so
> > a quick test and fix for this is coopt another machine and
> > aggregate some
> > RAM from another machine - if there is physical space in the machine -
> > sometime tommorow hopefully.
> >
> > Stay tuned.
> >
> > >
> > >
> > > Thanks Robin for this email.  I am able to reproduce what
> > you reported
> > > using the file that you gave me below so thank you very
> > much for that.
> > > From what I can see this appears to me a memory allocation
> > issue that
> > > affects all systems but because the main node has such fast ethernet
> > > speeds on can see the results of the problem quckly.  I am
> > testing this
> > > problem on a system with more memory and may have a better
> > sense of what
> > > is needed once I see the results.
> > >
> > > I'll let you know as I learn more perhaps later today.
> > >
> > > Thanks,
> > > Cere
> > >
> > > On Wed, 28 Jan 2004, R. Anderson wrote:
> > >
> > > > Date: Wed, 28 Jan 2004 22:25:11 -0800 (PST)
> > > > From: R. Anderson <anders10 at u.washington.edu>
> > > > To: Cere M. Davis <cere at u.washington.edu>
> > > > Cc: morrism at u.washington.edu
> > > > Subject: Re: error
> > > >
> > > > Cere-
> > > > In the March files(which use the same .dta as the match
> > files-- we were
> > > > looking at on friday),  I was able to get 1979-1988 and
> > 1996-2001 to
> > > > run with marchdatameta.R and create Rdata files.
> > > >
> > > > However when the meta file ran, for example, 1989, the
> > vector error
> > > > occured again.
> > > >
> > > > So I tried running some of the files (marchdatacopy1989.R,
> > > > marchdatacopy1990.R,...) individually.  I was able to
> > produce an RData set
> > > > from the 1989 file.
> > > >
> > > > However when I ran the 1990.R file, I got the
> > > > follwing error:
> > > >
> > ______________________________________________________________________
> > > >
> > > >
> > > > > ##################################################
> > > > > # marchdatacopy1990.R                            #
> > > > > # 10 Jan 2004  -ra                               #
> > > > > #                                                #
> > > > > # This is a template file that is used to read   #
> > > > > # SPSS data into R and should prepare the basic  #
> > > > > # variables needed for the analysis of income    #
> > > > > # for any year 1990 that is specified. It is     #
> > > > > # sourced by the shell script "marchmetacode"    #
> > > > > # for years that are specified in                #
> > > > > # "marchdatameta.R".                             #
> > > > > #  -RA, 10 Jan 2004                         #
> > > > > ##################################################
> > > > >
> > > > > library(foreign)
> > > > > options(object.size = 10000000)
> > > > > mar1990 <-
> > > >
> > read.dta("/net/home/morrism/Data/CPS/March/Extracts.all/mar1990.dta")
> > > > Error: vector memory exhausted (limit reached?)
> > > >
> > > > Process R segmentation fault at Wed Jan 28 21:14:41 2004
> > > >
> > ______________________________________________________________
> > _________
> > > > This was ran in mos2, interactively in emacs and the
> > error differs from
> > > > the other vecor errors.
> > > >
> > > > And then I ran the marchdatacopy1990.R in klee and got
> > the following
> > > > warning:
> > > >
> > ______________________________________________________________
> > _______________
> > > > run marchdatacopy1990.R
> > > > /usr/local/R-1.8.1/lib/R/bin/BATCH: line 55: 31545 Done
> > > > ( echo "invisible(options(echo = TRUE))"; cat ${in}; echo
> > "proc.time()" )
> > > >      31546 Killed                  | ${R_HOME}/bin/R
> > ${opts} >${out} 2>&1
> > > >
> > ______________________________________________________________
> > _____________
> > > >
> > > > When I openned the outfile, marchdatacopy1990.Rout, There
> > was nothing but
> > > > the R prompt.(This is outfile after running the file in klee)
> > > >
> > > > I can stop by Friday morning or Thursday
> > > > afternoon(I meet with Prof Morris at 3 and can stop by
> > afterwards).
> > > >
> > > > I think it is very odd that the marchdatameta file ran
> > without error some
> > > > of the years and others it produced an error.  Aslo note
> > that running
> > > > the matchdatameta file continued to produce same errors
> > as before for all
> > > > years.
> > > >
> > > >
> > > > The directories for the match and march are:
> > > >
> > > > /net/home/morrism/Data/CPS/Comp/R/Code/MarchData ---For march
> > > > /net/home/morrism/Data/CPS/Comp/R/Code/MatchData ---For match
> > > >
> > > > In each directory I am creating datasets from the same
> > .dta files, which
> > > > are in:
> > > >
> > > > /net/home/morrism/Data/CPS/March/Extracts.all
> > > >
> > > > So I do not understand why the marchdatameta file will
> > work for some years
> > > > and the matchdatameta produces the vector error for all years.
> > > >
> > > >
> > > > Thanks,
> > > > Robin Anderson
> > > >
> > > >
> > > >
> > > > On Fri, 23 Jan 2004, Cere M. Davis wrote:
> > > >
> > > > >
> > > > > If you are going to be around today please come by and
> > we'll work on this
> > > > > some more if you have time.
> > > > >
> > > > > >
> > > > > >
> > > > > > Cere-
> > > > > > By running ..1987.R through the matchdatmeta.R I do
> > get the "vector"
> > > > > > error.
> > > > > > I am running that file interactivly through emacs/R
> > split window.
> > > > > > Here is the file path for the .Rout file:
> > > > > >
> > > > > >
> > /net/home/morrism/Data/CPS/Comp/R/Code/MatchData/matchdatacopy
> > 1987.Rout
> > > > > >
> > > > > > This is the file path for the file that creates an R
> > for each year, runs
> > > > > > the R file, by R BATCH --no-save, to get the .Rout file.:
> > > > > >
> > > > > >
> > /net/home/morrism/Data/CPS/Comp/R/Code/MatchData/matchdatameta.R
> > > > > >
> > > > > > Thanks Again
> > > > > > Robin
> > > > > >
> > > > >
> > > > > - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> > - - - - - - - - -
> > > > > 		        Cere Davis
> > > > > 		Unix Systems Administrator - CSDE
> > > > >             cere at u.washington.edu   ph: 206.685.5346
> > > > >          https://staff.washington.edu/cere
> > > > >
> > > > > GnuPG Key   http://staff.washington.edu/cere/gpgkey.txt
> > > > > Key fingerprint = B63C 2361 3B9B 8599 ECC9  D061 3E48
> > A832 F455 9E7FA
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> > - - - - - - -
> > > 		        Cere Davis
> > > 		Unix Systems Administrator - CSDE
> > >             cere at u.washington.edu   ph: 206.685.5346
> > >          https://staff.washington.edu/cere
> > >
> > > GnuPG Key   http://staff.washington.edu/cere/gpgkey.txt
> > > Key fingerprint = B63C 2361 3B9B 8599 ECC9  D061 3E48 A832
> > F455 9E7FA
> > >
> > >
> > >
> > >
> > >
> > >
> >
> > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> > - - - - - -
> > 		        Cere Davis
> > 		Unix Systems Administrator - CSDE
> >             cere at u.washington.edu   ph: 206.685.5346
> >          https://staff.washington.edu/cere
> >
> > GnuPG Key   http://staff.washington.edu/cere/gpgkey.txt
> > Key fingerprint = B63C 2361 3B9B 8599 ECC9  D061 3E48 A832 F455 9E7FA
> >
> >
> >
>
>

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
		        Cere Davis
		Unix Systems Administrator - CSDE
            cere at u.washington.edu   ph: 206.685.5346
         https://staff.washington.edu/cere

GnuPG Key   http://staff.washington.edu/cere/gpgkey.txt
Key fingerprint = B63C 2361 3B9B 8599 ECC9  D061 3E48 A832 F455 9E7FA




More information about the R-help mailing list