[R] splitting multiple data in one column into multiple rows with one entry per column

Felix Müller-Sarnowski drflxms at googlemail.com
Sun Jul 26 21:26:53 CEST 2009


Dear R colleagues,

I annotated a list of single nuclotide polymorphiosms (SNP) with the
corresponding genes using biomaRt. The result is the following
data.frame (pasted from R):

snp                                 ensembl_gene_id
1      rs8032583
2      rs1071600                                 ENSG00000101605
3      rs13406898                                 ENSG00000167165
4      rs7030479                                 ENSG00000107249
5      rs1244414                                 ENSG00000165629
6      rs1005636                                 ENSG00000230681
7      rs927913                 ENSG00000151655;ENSG00000227546
8      rs4832680
9      rs4435168 ENSG00000229164;ENSG00000225227;ENSG00000211817
10     rs7035549
11     rs12707538                                 ENSG00000186472

As you can see, the SNP with the identifier rs4435168 corresponds to 3
gene ids, rs927913 corresponds to 2 gene ids. As I'd like to perform a
join of several data.frames using the ensembl_gene_id later on, I'd
like to split columns with multiple gene identifiers into rows with
only one ensembl gene identifier each. So for the example of rs4435168
it should look like this (faked output):

snp                   ensembl_gene_id
...
9      rs4435168 ENSG00000229164
10    rs4435168 ENSG00000225227
11    rs4435168 ENSG00000211817
...

This is just a simple example. Finally there will be a lot of other
columns, which should be replicated like the snp column.

Does anyone know how to do this? I tried strsplit, which splits nicely
the multiple entries in column ensembl_gene_id. But how to go on?

I'd appreciate any kind of help very much!
Best regards from Munich,
Felix




More information about the R-help mailing list