[R] Handling special characters in reading and writing to CSV

Thu Feb 6 18:43:06 CET 2014

Hi Venkata,

That example reads into R fine for me. I copied and saved it as
tmp.csv and simply read it in with

dat <- read.csv("tmp.csv")

which gave me a data.frame with one row and 78 columns as expected.
This worked in three different environments (linux, mac, windows), and
with different versions of R. Does it not work for you? If not please
post the results of running sessionInfo() so we can see what version
of R etc. you are using.

Best,
Ista

On Thu, Feb 6, 2014 at 3:34 AM, Venkata Kirankumar
<kiran4u2all at gmail.com> wrote:
> Dear Ista,
> I copied my data below
>
> UNIQUEID,FINDINGSID,ORGNUMRES,STNUMRES,CONVRES,VISITDY,ORGCHARRES,STCHARRES,NOMINALDAY,NOMINALDATE,MEASRMTDAY,MEASRMTDATE,INPUTDATE,NEOPLASMNAME,TUMORCLASSNAME,CATDOMAIN,CATDID,SPECIMENTYP,SPTDID,PCDOMAIN,USUBJID,PCDID,TESTDOMAIN,TSTDID,ORRESUNIT,RESDID,SUBJECTSID,STDRESUNIT,STDRDID,CONVRESUNIT,COVRDID,CUSTOMFIELD5,GRPLABEL,GRPNUMBER,SEX,SEXDID,TRIALGROUPSID,SPECIMENLOC,SPECIMENCOND,SPECIMENCOND1,SPECIMENCOND2,SPECIMENCOND3,SEVERITY,COMM,ASPECT,CAUSEOFDEATH,DERIVEFLG,PHASENAME,PHASENAMEDID,ENTITY,ENTITYDID,SECONDARYFLAG,CUSTOMFIELD0,CUSTOMFIELD4,CUSTOMFIELD6,CUSTOMFIELD9,SOURCE,RESCATEGORY,OFSPSEX,OFFSPNUM,FILEID,ANALYTEID,ANALYTEDID,DATETIME,ELTM,ENDY,NOMDAYOFPHASE,OFSPSEXDID,PLTIMEPOINT,STATUSFLAG,ANABIOREGION,TESTMETHOD,FINDLOC,CUSTOMFIELD8,TIMESLOTDESC,TIMESLOTCODE,PTPTN,TPTNUM
> 3073004,3073004,,37.800000000000000,37.800000000000000,61,© ® ™ ℠ ℗ ₳ ฿ ₵ ¢
> ₡ ₢ ₠ $ ₫ ৳ ₯ € ƒ ₣ ₲ ₴ ₭ ₺ ℳ ₥ ₦ ₧ ₱ ₰ £ ₹ ₨ ₪ ₸ ₮ ₩ ¥ ៛,© ® ™ ℠ ℗ ₳ ฿ ₵ ¢
> ₡ ₢ ₠ $ ₫ ৳ ₯ € ƒ ₣ ₲ ₴ ₭ ₺ ℳ ₥ ₦ ₧ ₱ ₰ £ ₹ ₨ ₪ ₸ ₮ ₩ ¥
> ៛,61,,,,,,,,,,,BW,"apcu102881`~!@#$%^&*()-_+={[}]|:;""'<,>.?/",16082,© ® ™ ℠
> ℗ ₳ ฿ ₵ ¢ ₡ ₢ ₠ $ ₫ ৳ ₯ € ƒ ₣ ₲ ₴ ₭ ₺ ℳ ₥ ₦ ₧ ₱ ₰ £ ₹ ₨ ₪ ₸ ₮ ₩ ¥ ៛,,© ® ™ ℠
> ℗ ₳ ฿ ₵ ¢ ₡ ₢ ₠ $ ₫ ৳ ₯ € ƒ ₣ ₲ ₴ ₭ ₺ ℳ ₥ ₦ ₧ ₱ ₰ £ ₹ ₨ ₪ ₸ ₮ ₩ ¥
> ៛,45741,38733,© ® ™ ℠ ℗ ₳ ฿ ₵ ¢ ₡ ₢ ₠ $ ₫ ৳ ₯ € ƒ ₣ ₲ ₴ ₭ ₺ ℳ ₥ ₦ ₧ ₱ ₰ £ ₹
> ₨ ₪ ₸ ₮ ₩ ¥ ៛,,© ® ™ ℠ ℗ ₳ ฿ ₵ ¢ ₡ ₢ ₠ $ ₫ ৳ ₯ € ƒ ₣ ₲ ₴ ₭ ₺ ℳ ₥ ₦ ₧ ₱ ₰ £ ₹
> ₨ ₪ ₸ ₮ ₩ ¥
> ៛,,"USB102881`~!@#$%^&*()-_+={[}]|:;""'<,>.?/","SC51`~!@#$%^&*()-_+={[}]|:;""'<,>.?/:
> SET 1€ é í ñ ó ú ü ¿ á é í ó ú ü
> ñ","SC51`~!@#$%^&*()-_+={[}]|:;""'<,>.?/",M,42133,1445,,,,,,,,,,,,,,,,,,,,,,,,2631,,,,,,,,,A,,,,,,,©
> ® ™ ℠ ℗ ₳ ฿ ₵ ¢ ₡ ₢ ₠ $ ₫ ৳ ₯ € ƒ ₣ ₲ ₴ ₭ ₺ ℳ ₥ ₦ ₧ ₱ ₰ £ ₹ ₨ ₪ ₸ ₮ ₩ ¥ ៛,
>
> Thanks & Regards,
> D V Kiran Kumar
>
>
> On Wed, Feb 5, 2014 at 5:05 PM, Ista Zahn <istazahn at gmail.com> wrote:
>>
>> Hi Kiran,
>>
>> Please post a reproducible example, either by pasting a sample of
>> comma separated values into you message, posting a .csv file somewhere
>> where we can download it. Without an example all we can do is guess
>> what your problem might be.
>>
>> Best,
>> Ista
>>
>> On Wed, Feb 5, 2014 at 5:10 AM, Venkata Kirankumar
>> <kiran4u2all at gmail.com> wrote:
>> > Hi David,
>> >
>> >
>> > In CSV RFC 4180 format if any ' or " character is there then character
>> > will
>> > go with escape character so CSV will distinguish properly.
>> >
>> >
>> >
>> > I will try with read.fwf once because with redline I am facing same
>> > issue.
>> >
>> > Thanks & Regards,
>> > D V Kiran Kumar.
>> >
>> >
>> > On Wed, Feb 5, 2014 at 3:14 AM, David Winsemius
>> > <dwinsemius at comcast.net>wrote:
>> >
>> >>
>> >> On Feb 4, 2014, at 7:58 AM, Venkata Kirankumar wrote:
>> >>
>> >> > Hi All,
>> >> >
>> >> >
>> >> > I have some data with different special characters, newline
>> >> > character,
>> >> and
>> >> > different language characters in a CSV file like `~!@#$%^&*|
>> >> > ()-_+={[}]|\:;""'<,>.?/
>> >> > in data, while I am trying to read this CSV and trying to do
>> >> calculations I
>> >> > am not able to get this data as there in single cell. I found
>> >> > something
>> >> > like RFC 4180 format can help to solve this problem.
>> >> >
>> >> >
>> >> >
>> >> > If anyone can give suggestion related to handling these special
>> >> characters
>> >> > it will be help full for me
>> >> >
>> >>
>> >> I'm having a difficult time understanding your expectations and thedata
>> >> situation. If it's a "csv file",  then how can all three of <comma>,
>> >> <single-quote>, and <double-quote> be properly distinguished when they
>> >> are
>> >> also part of the data?
>> >>
>> >>
>> >> You might consider using readLines (from base) or read.fwf (from the
>> >> utils
>> >> package)
>> >>
>> >>
>> >>
>> >> >
>> >> >
>> >> > Thanks in advance,
>> >> >
>> >> > D V Kiran Kumar
>> >> >
>> >> >       [[alternative HTML version deleted]]
>> >> >
>> >> > ______________________________________________
>> >> > R-help at r-project.org mailing list
>> >> > https://stat.ethz.ch/mailman/listinfo/r-help
>> >> > PLEASE do read the posting guide
>> >> http://www.R-project.org/posting-guide.html
>> >> > and provide commented, minimal, self-contained, reproducible code.
>> >>
>> >> David Winsemius
>> >> Alameda, CA, USA
>> >>
>> >>
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>
>