[R] how to identify record with broken format
bor|@@@te|pe @end|ng |rom utoronto@c@
Wed Jun 5 12:32:35 CEST 2019
I've seen that behaviour with a C" atom in a chemical structure.
Here is code to identify lines with an uneven number of quotation marks. Read your file with readLines() to use it.
myTxt <- '"This" "is" "fine"'
myTxt <- '"This" "is "not"'
myTxt <- 'This is ok'
x <- lengths(regmatches(myTxt, gregexpr('\\"', myTxt))) # (1)
which(x %% 2 == 1)
(1) credit to https://stackoverflow.com/questions/12427385/how-to-calculate-the-number-of-occurrence-of-a-given-character-in-each-row-of-a
> On 2019-06-05, at 06:12, Luigi Marongiu <marongiu.luigi using gmail.com> wrote:
> Dear all,
> I have a large dataframe where one of the records in a column must
> have been wrongly formatted, in particular i think is missing a
> closing ".
> When I try to show only that column's value I get a  with plenty of
> empty space, the final record  and the system freezes. also, when
> i try to plot i get a table's printout instead of a real plot.
> Is there a way to identify the record with the format? On a
> spreadsheet or text editor, all records seem OK; end there are too
> many records to visually inspect them all.
> Best regards,
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help