[R] data frame pointers?
dwinsemius at comcast.net
Thu Oct 24 02:39:16 CEST 2013
On Oct 23, 2013, at 5:24 PM, David Winsemius wrote:
> On Oct 23, 2013, at 4:36 PM, Jon BR wrote:
>> I've been running several programs in the unix shell, and it's time to
>> combine results from several different pipelines. I've been writing shell
>> scripts with heavy use of awk and grep to make big text files, but I'm
>> thinking it would be better to have all my data in one big structure in R
>> so that I can query whatever attributes I like, and print several
>> corresponding tables to separate files.
>> I haven't used R in years, so I was hoping somebody might be able to
>> suggest a solution or combinatin of functions that could help me get
>> Right now, I can import my data into a data frame that looks like this:
>> df <-
>> case gene issue
>> 1 case_1 gene1 nsyn
>> 2 case_1 gene1 amp
>> 3 case_2 gene1 del
>> 4 case_3 gene2 UTR
>> I'd like to cook up some combination of functions/scripting that can
>> convert a table like df to produce a list or a data frame/ matrix that
>> looks like df2:
>> case_1 case_2 case_3
>> gene1 nsyn,amp del 0
>> gene2 0 0 UTR
>> I can build df2 manually, like this:
> Factors will be a hassle:
> df <-
> data.frame(case=c("case_1","case_1","case_2","case_3"), gene=c("gene1","gene1","gene1","gene2"), issue=c("nsyn","amp","del","UTR"), stringsAsFactors=FALSE)
Note also that stringsAsFactors can be set globally with options as well as during input functions with any of hte cousins of read.table.
> with( df, matrix( tapply(issue, list(gene, case), list) ,
> nrow=length(unique(gene)),ncol=length(unique(case)) )
> [,1] [,2] [,3]
> [1,] Character,2 "del" NA
> [2,] NA NA "UTR"
>  "nsyn" "amp"
> V1 V2 V3
> 1 nsyn, amp del NA
> 2 NA NA UTR
It's possible that coming back to R after many years you are not familiar with data.table. It's particularly well suited for large text files. It's syntax with argumets to "[" is quite different.
> dt <- data.table(df)
# To make a list in each category you would need to supply a "doubly `list`-ed" arguemtn to "j".
> dt[ , list(list(issue)), by=c("gene", 'case') ]
gene case V1
1: gene1 case_1 nsyn,amp
2: gene1 case_2 del
3: gene2 case_3 UTR
> dt[ , list(issue), by=c("gene", 'case') ]
gene case issue
1: gene1 case_1 nsyn
2: gene1 case_1 amp
3: gene1 case_2 del
4: gene2 case_3 UTR
>> but obviously do not want to do this by hand; I want R to generate df2 from
>> Any pointers/ideas would be most welcome!
>> [[alternative HTML version deleted]]
> R is a plain text mailing list. Old school, admittedly, but much better for coding questions. Surely an awk user can appreciate the wisdom of that request?
> David Winsemius
> Alameda, CA, USA
> R-help at r-project.org mailing list
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Alameda, CA, USA
More information about the R-help