[R] Reading a CSV file

@vi@e@gross m@iii@g oii gm@ii@com @vi@e@gross m@iii@g oii gm@ii@com
Sun Aug 7 06:04:16 CEST 2022


Erin,

No explanation of why you want to do that. Are there comments on those
lines, for example?

I see others have replied with things that boils down to reading the entire
file into a data structure with lines, then using indexing of some sort to
eliminate the lines you want to skip and make data from the data structure
rather than the file. . That works, albeit for large files, ...

But there are, as usual, many ways to do things. Some people read in files
on their own and do the comma separation, type checking of columns and so on
and are free to make their own structure, perhaps the hard way. Clearly
skipping lines becomes trivial.

Then  there is the concept of reading it twice with the first pass trivially
picking up just a comma-separated line you can make into a series of headers
and then call something like read.csv and tell it to skip your N lines and
use these names for the columns.

Let me suggest an alternate solution IF you can arrange for the lines you do
not want (and only those) to begin with some comment character like "#" in
my example below:

text <- 'head1, head2
# ignore
# ignore 2
# ignore 3
# ignore 4
1,2
3,2'

hi <- read.csv(text=text, comment.char="#")

The above returns:

head1 head2
1     1     2
2     3     2

It will ignore any number of lines. If the data has anything special like
that, this might be a way to get what you want.

There are other packages like dplyr in the tidyverse with related but
sometimes different functionality and the same effect can be had with a
variation of my technique using read_csv() [note underscore not period]

hi <- text %>%  read_csv(comment="#")

Or use the new pipe symbol if you prefer:

hi <- text |> read_csv(comment="#")

What this gives you perhaps is more options such as a skip_empty_rows=TRUE
option that would remove the lines if blank.

And I tried some rather weird ideas like this:

text <- 'head1, head2
# ignore
# ignore 2 more stuff
# ignore 3, more stuff
# ignore 4
1,2
3,2'

hi <- text |> read_csv(col_types=c(col_integer(), col_integer()))

The idea was to TELL it what type to expect and hope the bad lines become
NA. Well, not quite. It made everything character given the above data that
was no longer suppressing comments:

> hi
# A tibble: 6 × 2
head1                 head2     
<chr>                 <chr>     
  1 # ignore              NA        
2 # ignore 2 more stuff NA        
3 # ignore 3            more stuff
4 # ignore 4            NA        
5 1                     2         
6 3                     2         
> typeof(hi$head1)
[1] "character"
> typeof(hi$head2)
[1] "character"

But although this seems bad, it opens a door to consider. As long as
whatever is on those 4 lines does not mess things up by say making
additional columns, you probably can read the darn thing in to a tibble or
data.frame and then remove the rows you do not want and convert the columns
from character to whatever you want such as integer or numeric. 

Finally, if you have any control over the file contents, guess what happens
if you place the header line AFTER the four skipped lines like this?

text <- '# ignore
# ignore 2 more stuff
# ignore 3, more stuff
# ignore 4
head1, head2
1,2
3,2'

You now tell it to skip 4 lines AND use a header and it works for me!

read.csv(text=text, header=TRUE, skip=4)

There seems to be many ways to consider and I would not be shocked if some
program that does this data import even allowed you to specify what rows to
ignore more dynamically. 

But perhaps the first solution you got is more dynamic as it allows you to
process the text as a series of lines in all kinds of ways, such as removing
any rows that contain the number 666 or even editing it in some way,
combining data from multiple files, and so on.

-----Original Message-----
From: R-help <r-help-bounces using r-project.org> On Behalf Of Erin Hodgess
Sent: Saturday, August 6, 2022 8:16 PM
To: r-help using r-project.org
Subject: [R] Reading a CSV file

Hello!

Is there a way to read the first line of a CSV file, then skip 4 lines, then
continue reading, please?

I know you can skip from the top, but I don't know if you can read and then
skip.

Thanks,
Erin


Erin Hodgess, PhD
mailto: erinm.hodgess using gmail.com

	[[alternative HTML version deleted]]

______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list