[R] Importing data into R and combining 2 files
andy.choens at gmail.com
Thu May 14 22:23:18 CEST 2009
On Thu, 2009-05-14 at 10:30 -0700, Sunita22 wrote:
> I have to import 2 txt files into R. 1 file contains the data and the other
> contains the header, column headings, datatypes and labels for the data.
This is your first complicating factor.
> I have 2 problems:
> 1) my data file has mixed type of data e.g. 1 2 3 4 5 3-5 02/04/06 3 4 5 and
> so on, the data file is tab separated. when I import it, the data is getting
> stored in one single variable say V1. I need to separate it into rows and
> columns. how do I this? Which commands in R would be useful for the same?
This shouldn't be too hard.
> 2) The other file is also tab separated. the 6 lines contains header and
> introduction as in the name of the dataset, year, etc. and then column names
> its datatypes and labels. After importing the data in this file also gets
> stored in one single variable. I need to separate it into rows and columns.
> how do I this? Which commands in R would be useful for the same?
This isn't that hard either, but it's not all in the best place.
The following is my 2 cents on this. I don't know what platform you are
on, so it's possible that my reference to sed may be more trouble than
it's worth. You have it if you are running Linux or OS X.
Your data structure is part of the problem. Where is this data set
coming from? That could be a key piece of information that could help
someone show you a short cut.
I would start by rolling your two files together into one big happy tab
separated file. You can remove the header entirely. It's just going to
get in the way. I am assuming that the order of your variables
(horizontally) are in the same order in the two files. I would double
check that these are in the same horizontal oder before actually
proceeding any further.
Delete the header. It's not going to to import correctly with
read.table(). You could stick this in as a note in your .R code if you
would like. (#)
As for labels, it is often easier in R to drop the integer = factor
label structure found in programs like SPSS. Rather than 1=Yes 5=No I
use Yes and No in the actual data. For most categorical data, this makes
it easier to work with. For ordinal data it can be more of a problem
though. If it's all just categorical, I would use a tool such as sed
(Linux/Unix commandline) to go through and apply my labels. Or you can
pull the data into R first and then do this with R. It's your choice. If
you are on Windows and don't know what sed is, forget about sed and just
use R to reassign your variables. R may make your life easier here.
When you use read.table to import your text file, it will store it in a
single variable. This variable will be a data frame and should preserve
your individual columns and rows. If you aren't familiar with data
frames, you should really start with some introductory material. I will
assume that you are in a hurry. There are some really great texts such
as Introduction to R that you should read, but a quick primer can be
This is an especially good link if you've ever used SPSS/PSPP before
trying to use R, since the author also started in SPSS and understands
how/why R is confusing to people making this switch. There are also some
good links to other introductory materials that you should read.
Since you have all of your labels in a separate
Note: You will get more help on this forum if your request for help
includes reproducible code/information. Thus, if you told us how to
reproduce a dummy example of your two text file, (although this may be
private/proprietary), examples of the code you have tried and what you
get as a result usually results in better answers.
This is the price and the promise of citizenship.
- Barack Obama
More information about the R-help