[R] EOF within quoted string

Mohan.Radhakrishnan at cognizant.com Mohan.Radhakrishnan at cognizant.com
Fri Aug 11 10:58:28 CEST 2017


Yes. I tried that already. Not straightforward.

data <- read.csv("20_newsgroups.csv",fill=TRUE,as.is=T,header=F, quote="", sep=",", encoding="UTF-8")

This line does read it haphazardly. The emails in the column are split into multiple columns and there are several columns with just ‘NA’. Totally 202 columns.

And then I removed columns with NA’s and concatenated all the text and finally got it.

munged <- data[, unlist(lapply(data, function(x) !all(is.na(x))))]
munged <- munged[-1,]
munged$text <- apply( munged[ , c(3:ncol(munged)) ] , 1 , paste0 , collapse = " ")

munged <- munged[,c("V1","V2","text")]

print(head(munged$text))

Mohan

From: Adams, Jean [mailto:jvadams at usgs.gov]
Sent: Thursday, August 10, 2017 8:03 PM
To: Radhakrishnan, Mohan (Cognizant) <Mohan.Radhakrishnan at cognizant.com>
Cc: R help <r-help at r-project.org>
Subject: Re: [R] EOF within quoted string

You might want to try some of the suggestions mentioned in this post: https://stackoverflow.com/q/17414776/2140956

Jean

On Thu, Aug 10, 2017 at 7:59 AM, <Mohan.Radhakrishnan at cognizant.com<mailto:Mohan.Radhakrishnan at cognizant.com>> wrote:
Hi,

Reading http://ssc.wisc.edu/~ahanna/20_newsgroups.csv after downloading it using

data <- read.csv("20_newsgroups.csv",header=TRUE)

throws this.

Warning message:
In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
  EOF within quoted string

So, for example, the first line in the file is this. This column contains only such text. Is there a way read it ?

From: cubbie at garnet.berkeley.edu<mailto:cubbie at garnet.berkeley.edu> () Subject: Re: Cubs behind Marlins? How? Article-I.D.: agate.1pt592$f9a Organization: University of California, Berkeley Lines: 12 NNTP-Posting-Host: garnet.berkeley.edu<http://garnet.berkeley.edu>   gajarsky at pilot.njin.net<mailto:gajarsky at pilot.njin.net> writes:  morgan and guzman will have era's 1 run higher than last year, and  the cubs will be idiots and not pitch harkey as much as hibbard.  castillo won't be good (i think he's a stud pitcher)         This season so far, Morgan and Guzman helped to lead the Cubs        at top in ERA, even better than THE rotation at Atlanta.        Cubs ERA at 0.056 while Braves at 0.059. We know it is early        in the season, we Cubs fans have learned how to enjoy the        short triumph while it is still there.

Thanks,
Mohan
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.

        [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.

	[[alternative HTML version deleted]]



More information about the R-help mailing list