[R] extracting information from txt file

Taimur Sajid tsajid at primaticsfinancial.com
Wed Oct 31 18:56:39 CET 2012


This worked for the example you provided. Assumes the header count is the only numeric value on the 5th line.

	epa_extract <- function(address){
		doc <- readLines(address, n = 5)[5]
		
		head_count <- as.numeric(gsub("\\D", "", doc))
		
		read.table(address, sep = ",", header = TRUE, skip = head_count)
		}
		
	foo <- epa_extract("http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/wchem/chmval.txt")


Taimur Sajid
Research & Development Analyst
Primatics Financial

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of chuck.01
Sent: Wednesday, October 31, 2012 12:47 PM
To: r-help at r-project.org
Subject: [R] extracting information from txt file

Hello,

Here is a link to some data:
http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/wchem/chmval.txt

I am trying to read this in, and want to use: 
chmval <-
read.table("http://www.epa.gov/emap/html/data/surfwatr/data/mastreams/9396/wchem/chmval.txt",
sep=",", skip= 84, header=T)

the # 84, for 84 lines skipped needs to be derived from the 5th line of the txt file # Header Records:  85 

so, I need that # (-1) for input into the read.table statement above

I've tried grep but that didn't work: 
 (for this I downloaded the txt file and manually removed that hash mark!)

grep("Header Records:", read.table("chmval.txt", header=T)) Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
: 
  line 1 did not have 5 elements

Any ideas?
Can I just extract the 5th line?




--
View this message in context: http://r.789695.n4.nabble.com/extracting-information-from-txt-file-tp4648033.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list