[R] Subsetting a data.frame -> Read in with FWF format from .DAT file

RHelpPlease rrumple at trghcsolutions.com
Sat Mar 10 01:04:46 CET 2012


Hi there,
I am having trouble subsetting a data frame by a conditional via one column
(of many).

I read the file into R through "read.fwf," where I specified column widths. 
Original data is .DAT.  I then utilized "names" function to read in column
headings.

For one column, PRVDR_NUM, I wish to further amend the entire data set, but
only have PRVDR_NUM == 050108.  This is where I'm having trouble.

I've tried code like this:

newinpatient <- subset(oldinpatient, oldinpatient$PRVDR_NUM == 050108)
#OR
newinpatient <- oldinpatient[oldinpatient$PRVDR_NUM == 050108, ]
#OR
providernum <- data.frame(newdim(PRVDR_NUM = c(050108))
newinpatient <- merge(providernum, oldinpatient)

With checking "class" at one point, I gathered that R interprets PRVDR_NUM
as a factor, not a number .. so I've understood a potential reason why I
would have errors (with code above).  So, I then tried something like this:

newPRVDR_NUM <- format(as.numeric(levels(oldinpatient$PRVDR_NUM)
[oldinpatient$PRVDR_NUM]))
numericprvdr <- data.frame(oldinpatient, newPRVDR_NUM)
bestprvdr <- numericprvdr[,-2]

I thought that with converting PRVDR_NUM to numeric, then one of the three
options above would be satisfied.  But, that has not worked either.  (I did
confirm that the factor -> numeric worked, which it did)

Though R reads the three options (above) with no errors, upon performing a
"dim" check I receive the output: 0 93.  The columns are correct, but rows
(obviously) are not.  (I did confirm that the desired value exists multiple
times in the noted column, so 0 is definitely incorrect)

As well, I would like to work with PRVDR_NUM as a variable alone, but I've
found that with any of these variables/column names, I have to use
"allinpatient$PRVDR_NUM."  R does not recognize PRVDR_NUM alone.  Why?

More and more I think my problem is more foundational, meaning using the
read.fwf function in the first place?  Not using the read.fwf function
correctly?  Again, I've made enough progress with other variables & data
sets of this type I've been fine so far, but now & future I need to repeat
this code enough times where help in better understanding my errors & a more
elegant/efficient solution would be greatly appreciated.  

Also note that R does not read all 93 columns as factors.  Why would R
interpret this six-wide column as a factor, but the nine-wide column next
door as numeric?

Your help is most appreciated!

--
View this message in context: http://r.789695.n4.nabble.com/Subsetting-a-data-frame-Read-in-with-FWF-format-from-DAT-file-tp4461051p4461051.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list