[R] splitting a dataframe in R based on multiple gene names in a specific column

Bogdan Tanasa tanasa at gmail.com
Wed Aug 23 01:57:19 CEST 2017


I would appreciate please a suggestion on how to do the following :

i'm working with a dataframe in R that contains in a specific column
multiple gene names, eg :

> df.sample.gene[15:20,2:8]
     Chr     Start       End Ref Alt Func.refGene
Gene.refGene284 chr2  16080996  16080996   C   T ncRNA_exonic
       GACAT3448 chr2 113979920 113979920   C   T ncRNA_exonic
LINC01191,LOC100499194465 chr2 131279347 131279347   C   G
ncRNA_exonic              LOC440910525 chr2 223777758 223777758   T
A       exonic                  AP1S3626 chr3  99794575  99794575   G
 A       exonic                 COL8A1643 chr3 132601066 132601066   A
  G       exonic                  ACKR4

How could I obtain a dataframe where each line that has multiple gene names
(in the field Gene.refGene) is replicated with only one gene name ? i.e.

for the second row :

  448 chr2 113979920 113979920   C   T ncRNA_exonic LINC01191,LOC100499194

we shall get in the final output (that contains all the rows) :

  448 chr2 113979920 113979920   C   T ncRNA_exonic LINC01191
  448 chr2 113979920 113979920   C   T ncRNA_exonic LOC100499194

thanks a lot !

-- bogdan

	[[alternative HTML version deleted]]



More information about the R-help mailing list