[R] Subsetting a data.frame -> Read in with FWF format from .DAT file
rrumple at trghcsolutions.com
Sat Mar 10 01:04:46 CET 2012
I am having trouble subsetting a data frame by a conditional via one column
I read the file into R through "read.fwf," where I specified column widths.
Original data is .DAT. I then utilized "names" function to read in column
For one column, PRVDR_NUM, I wish to further amend the entire data set, but
only have PRVDR_NUM == 050108. This is where I'm having trouble.
I've tried code like this:
newinpatient <- subset(oldinpatient, oldinpatient$PRVDR_NUM == 050108)
newinpatient <- oldinpatient[oldinpatient$PRVDR_NUM == 050108, ]
providernum <- data.frame(newdim(PRVDR_NUM = c(050108))
newinpatient <- merge(providernum, oldinpatient)
With checking "class" at one point, I gathered that R interprets PRVDR_NUM
as a factor, not a number .. so I've understood a potential reason why I
would have errors (with code above). So, I then tried something like this:
newPRVDR_NUM <- format(as.numeric(levels(oldinpatient$PRVDR_NUM)
numericprvdr <- data.frame(oldinpatient, newPRVDR_NUM)
bestprvdr <- numericprvdr[,-2]
I thought that with converting PRVDR_NUM to numeric, then one of the three
options above would be satisfied. But, that has not worked either. (I did
confirm that the factor -> numeric worked, which it did)
Though R reads the three options (above) with no errors, upon performing a
"dim" check I receive the output: 0 93. The columns are correct, but rows
(obviously) are not. (I did confirm that the desired value exists multiple
times in the noted column, so 0 is definitely incorrect)
As well, I would like to work with PRVDR_NUM as a variable alone, but I've
found that with any of these variables/column names, I have to use
"allinpatient$PRVDR_NUM." R does not recognize PRVDR_NUM alone. Why?
More and more I think my problem is more foundational, meaning using the
read.fwf function in the first place? Not using the read.fwf function
correctly? Again, I've made enough progress with other variables & data
sets of this type I've been fine so far, but now & future I need to repeat
this code enough times where help in better understanding my errors & a more
elegant/efficient solution would be greatly appreciated.
Also note that R does not read all 93 columns as factors. Why would R
interpret this six-wide column as a factor, but the nine-wide column next
door as numeric?
Your help is most appreciated!
View this message in context: http://r.789695.n4.nabble.com/Subsetting-a-data-frame-Read-in-with-FWF-format-from-DAT-file-tp4461051p4461051.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help