[R] reshaping column items into rows per unique ID

Eric Berger ericjberger at gmail.com
Sun Feb 25 18:56:37 CET 2018


Hi Allaisone,
I took a slightly different approach but you might find this either as or
more useful than your approach, or at least a start on the path to a
solution you need.

df1   <-
data.frame(CustId=c(1,1,1,2,3,3,4,4,4),DietType=c("a","c","b","f","a","j","c","c","f"),
                    stringsAsFactors=FALSE)
custs <- unique(df1$CustId)
dtype <- unique(df1$DietType)
nc    <- length(custs)
nd    <- length(dtype)
df2   <- as.data.frame( matrix(rep(0,nc*(nd+1)),nrow=nc),
stringsAsFactors=FALSE)
colnames(df2) <- c("CustId",dtype[order(dtype)])
df2$CustId <- custs[ order(custs) ]

for ( i in 1:nrow(df1) ) {
  iRow <- match(df1$CustId[i],df2$CustId)
  iCol <- match(df1$DietType[i],colnames(df2))
  df2[ iRow, iCol ] <- df2[ iRow, iCol] + 1
}

> df2
#       CustId   a  b  c  f   j
# 1             1  1  1  1  0  0
# 2              2  0 0  0  0  0
# 3              3  1 0  0  0  1
# 4              4  0 0  2  1  0

The dataframe df2 will have a column for the CustId and one column for each
unique diet type.
Each row is a unique customerId, and each entry contains the number of
times the given diet type occurred for that customer.

I hope that helps,
Eric



On Sun, Feb 25, 2018 at 7:08 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:

> I believe you need to spend time with an R tutorial or two: a data frame
> (presumably the "table" data structure you describe) can *not* contain
> "blanks" -- all columns must be the same length, which means NA's are
> filled in as needed.
>
> Also, 8e^5 * 7e^4 = 5.6e^10, which almost certainly will not fit into any
> local version of R (maybe it would in some server version -- others more
> knowledgeable should comment on this).
>
> Cheers,
> Bert
>
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
> On Sun, Feb 25, 2018 at 4:59 AM, Allaisone 1 <allaisone1 at hotmail.com>
> wrote:
>
> > Hi All
> >
> > I have a datafram which looks like this :
> >
> > CustomerID    DietType
> > 1                           a
> > 1                            c
> > 1                            b
> > 2                            f
> > 2                             a
> > 3                             j
> > 4                             c
> > 4                             c
> > 4                              f
> >
> > And I would like to reshape this so I can see the list of DietTypes per
> > customer in rows instead of columns like this :
> >
> > > MyDf
> > CustomerID      DietType   DietType  DietType
> > 1                                a            c               b
> > 2                                 f             a
> > 3                                 j
> > 4                                 c              c             f
> >
> > I tried many times using melt(),spread (),and dcast () functions but was
> > not able to produce the desired table. The best attempt was by typing :
> >
> > # 1) Adding new column with unique values:
> > MyDf $newcol <- c (1:9)
> > #2) then :
> > NewDf <- dcast (MyDf,CustomerID~newcol,value.var=DietType)
> >
> > This produces the desired table but with many NA values like this :
> >
> > CustomerID    1   2   3    4     5    6     7   8   9
> > 1                    a  c    b   NA NA NA NA NA NA
> > 2                  NA NA NA  f     a  NA NA NA NA
> > 3                  NA NA NA NA NA  j   NA NA NA
> > 4                  NA NA NA NA NA NA c     c     f
> >
> >   As you see, the lette/s indicating DietType move to the right side each
> > time we move down leaving many NA values and as my original files is very
> > large, I expect that the final output would contain around 800,000
> columns
> > and 70,000 rows. This is why my code works with small data but does not
> > work with my large file because of memory issue even though I'm using
> large
> > PC.
> >
> > What changes I need to do with my code to produce the desired table where
> > the list of DietTypes are grouped in rows exactly like the second table
> > shown abover?
> >
> > Regards
> > Allaisnoe
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> > posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list