[R] Fwd: Re: transpose and split dataframe

Matthew mccorm@ck @end|ng |rom mo|b|o@mgh@h@rv@rd@edu
Tue Apr 30 23:31:28 CEST 2019

Thanks for your reply. I was trying to simplify it a little, but must 
have got it wrong. Here is the real dataframe, TF2list:

'data.frame':    152 obs. of  2 variables:
  $ Regulator: Factor w/ 87 levels "AT1G02065","AT1G13960",..: 17 6 6 54 
54 82 82 82 82 82 ...
  $ hits     : Factor w/ 97 levels 
__truncated__,..: 65 57 90 57 87 57 56 91 31 17 ...

    And the first few lines resulting from dput(head(TF2list)):

structure(list(Regulator = structure(c(17L, 6L, 6L, 54L, 54L,
82L), .Label = c("AT1G02065", "AT1G13960", "AT1G18860", "AT1G23380",
"AT1G29280", "AT1G29860", "AT1G30650", "AT1G55600", "AT1G62300",
"AT1G62990", "AT1G64000", "AT1G66550", "AT1G66560", "AT1G66600",
"AT1G68150", "AT1G69310", "AT1G69490", "AT1G69810", "AT1G70510", ...

This is another way of looking at the first 4 entries (Regulator is 
tab-separated from hits):



    So, the goal would be to

first: Transpose the existing dataframe so that the factor Regulator 
becomes a column name (column 1 name = AT1G69490, column2 name 
AT1G29860, etc.) and the hits associated with each Regulator become 
rows. Hits is a comma separated 'list' ( I do not not know if 
technically it is an R list.), so it would have to be comma 
'unseparated' with each entry becoming a row (col 1 row 1 = AT4G31950, 
col 1 row 2 - AT5G24410, etc); like this :


... I did not include all the rows)

I think it would be best to actually make the first entry a separate 
dataframe ( 1 column with name = AT1G69490 and number of rows depending 
on the number of hits), then make the second column (column name = 
AT1G29860, and number of rows depending on the number of hits) into a 
new dataframe and do a full join of of the two dataframes; continue by 
making the third column (column name = AT1G2986) into a dataframe and 
full join it with the previous; continue for the 152 observations so 
that then end result is a dataframe with 152 columns and number of rows 
depending on the entry with the greatest number of hits. The full joins 
I can do with dplyr, but getting up to that point seems rather difficult.

This would get me what my ultimate goal would be; each Regulator is a 
column name (152 columns) and a given row has either NA or the same hit.

    This seems very difficult to me, but I appreciate any attempt.


On 4/30/2019 4:34 PM, David L Carlson wrote:
>          External Email - Use Caution
> I think we need more information. Can you give us the structure of the data with str(YourDataFrame). Alternatively you could copy a small piece into your email message by copying and pasting the results of the following code:
> dput(head(YourDataFrame))
> The data frame you present could not be a data frame since you say "hits" is a factor with a variable number of elements. If each value of "hits" was a single character string, it would only have 2 factor levels not 6 and your efforts to parse the string would make more sense. Transposing to a data frame would only be possible if each column was padded with NAs to make them equal in length. Since your example tries use the name TF2list, it is possible that you do not have a data frame but a list and you have no factor levels, just character vectors.
> If you are not familiar with R, it may be helpful to tell us what your overall goal is rather than an intermediate step. Very likely R can easily handle what you want by doing things a different way.
> ----------------------------------------
> David L Carlson
> Department of Anthropology
> Texas A&M University
> College Station, TX 77843-4352
> -----Original Message-----
> From: R-help<r-help-bounces using r-project.org>  On Behalf Of Matthew
> Sent: Tuesday, April 30, 2019 2:25 PM
> To: r-help (r-help using r-project.org)<r-help using r-project.org>
> Subject: [R] transpose and split dataframe
> I have a data frame that is a lot bigger but for simplicity sake we can
> say it looks like this:
> Regulator    hits
> AT1G69490    AT4G31950,AT5G24110,AT1G26380,AT1G05675
> AT2G55980    AT2G85403,AT4G89223
>      In other words:
> data.frame : 2 obs. of 2 variables
> $Regulator: Factor w/ 2 levels
> $hits         : Factor w/ 6 levels
>     I want to transpose it so that Regulator is now the column headings
> and each of the AGI numbers now separated by commas is a row. So,
> AT1G69490 is now the header of the first column and AT4G31950 is row 1
> of column 1, AT5G24110 is row 2 of column 1, etc. AT2G55980 is header of
> column 2 and AT2G85403 is row 1 of column 2, etc.
>     I have tried playing around with strsplit(TF2list[2:2]) and
> strsplit(as.character(TF2list[2:2]), but I am getting nowhere.
> Matthew
> ______________________________________________
> R-help using r-project.org  mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

	[[alternative HTML version deleted]]

More information about the R-help mailing list