[R] OT: batch processing XLS files to CSV

Chris Evans chris at psyctc.org
Wed May 28 19:52:21 CEST 2008


Dear R gurus, particularly those of generous M$ tolerance and diverse 
gifts and knowledge!

I have an interesting challenge that I will end up crunching in R 
involving service usage by patients.  Maybe I can do all of it in R but 
I can't see how yet.

My situation is that our IT Department can give me loads of XLS files 
about patients one of our services have seen.  The are one per patient 
per time period.  All the data are in the first sheet of the XLS files 
and that sheet contains four variable length but fixed format matrices 
of data:
1) demographics (actually, this is fixed length, one row!);
2) community contacts with services, variable length, rarely zero rows 
but could be;
3) inpatient admissions, variable length, often zero rows;
and CPA information (don't ask what that is!), two rows, fixed format, 
just to make things tricky, they're spearated by a fixed few junk rows 
in the xls files. The column format of each block is different.

Each block starts with standard label rows so it will be easy to 
identify these start points and know the format on the rows that follow 
each one.  I could use perl to scan for these and then read the zero to 
many lines of the data in the matrix and end on finding the next header.

I would be fairly happy to do this with perl but would need to convert 
the xls (xls 2002) files to CSV to get at them in Perl (I think).

Anyone out there done anything like this and can give me any advice? 
I'm sorry, I'm sure there are more specific lists or web resources but I 
think the skills are here too and if someone can tell me how to do this 
all in R, I'd be fascinated.

Many thanks,

Chris

-- 
Chris Evans <chris at psyctc.org> Skype: chris-psyctc
Professor of Psychotherapy, Nottingham University;
Consultant Psychiatrist in Psychotherapy, Notts PDD network;
Research Programmes Director, Nottinghamshire NHS Trust;
*If I am writing from one of those roles, it will be clear. Otherwise*
*my views are my own and not representative of those institutions    *



More information about the R-help mailing list