[BioC] Cell DNA content data normalization and gating

Greg Finak gfinak at fhcrc.org
Wed Feb 23 20:20:57 CET 2011


Hi, Xian
I would suggest having a look at the flowClust package. It is used for clustering / gating flow data in one or multiple dimensions via mixture modelling and should be well suited for estimating proportions of cell populations in different phases of the cell cycle. flowClust outputs a model object that contains the proportions of each component in the model (fraction of total cells represented by that component), as well as the mean (location), standard deviation, and other model parameters. The proportions and means are probably all that you are looking for. If you need to match components across multiple samples, use the estimated component means. A very basic example is below. See the package vignette for further details. 

I don't usually work with the type of data you describe so there are probably domain specific subtleties I'm not familiar with. If you run into problems, please let me know, I'll be glad to tweak the package to make it more effective / useful for such an application. At the least I'll update the vignette to provide further examples. 

Hope this helps,

Greg.



#Simple example - flowClust in 1D fitting 3 components. 

require(flowClust)
require(flowViz)

#Some artificial data, Sample X, 3 components, real proportions are 33.3%, 16.66%, and 50%
X<-as.matrix(c(rnorm(1000,mean=10,sd=sqrt(2)),rnorm(500,mean=20,sd=sqrt(2)),rnorm(1500,mean=30,sd=sqrt(2))))

colnames(X)<-"A"
X<-round(X)
X<-flowFrame(X)

#Sample Y , 3 components, real proportions are 25%, 25%, and 50%, peaks are shifted slightly.
Y<-as.matrix(c(rnorm(750,mean=11,sd=sqrt(2)),rnorm(750,mean=23,sd=sqrt(2)),rnorm(1500,mean=29,sd=sqrt(2))))
colnames(Y)<-"A"
Y<-round(Y)
Y<-flowFrame(Y)

par(mfrow=c(1,2))
plot(X,breaks=256)
plot(Y,breaks=256)

#If we know the data has 3 components:
f1<-flowClust(X,K=3,varNames=c("A"))
f2<-flowClust(Y,K=3,varNames=c("A"))
#plot the result
par(mfrow=c(1,2))
hist(f1,data=X)
hist(f2,data=Y)

#The order of components may be different above. Use the estimated means to reorder them
f1 at w[order(f1 at mu)] #Proportions ordered by increasing mean for the component, model 1
f2 at w[order(f2 at mu)] #Proportions ordered by increasing mean for the component, model 2


#If you don't know the number of components, you would use the BIC to estimate the best fit:
f1<-flowClust(X,varNames="A",K=1:5) #Fit multiple numbers of clusters;
f2<-flowClust(Y,varNames="A",K=1:5)

par(mfrow=c(1,2))
plot(1:5,BIC(f1),type="o",xlab="K",ylab="BIC");
plot(1:5,BIC(f2),type="o",xlab="K",ylab="BIC");

which.max(BIC(f1)) #The maximum should be at 3.
which.max(BIC(f2)) #The maximum should be at 3.

f1<-f1[[which.max(BIC(f1))]] #Extract the best fitting model
f2<-f2[[which.max(BIC(f2))]] #Extract the best fitting model

f1 at w[order(f1 at mu)] #Proportions ordered by increasing mean for the component, model 1
f2 at w[order(f2 at mu)] #Proportions ordered by increasing mean for the component, model 2



 
On 2011-02-23, at 5:20 AM, Xian Zhang wrote:

> Dear Bioconductor users,
> 
> We have a univariate readout (DNA content) to study cell cycle
> subpopulations. The data looks like this, with around 3000 cells per sample.
> 
> 
>                cell1  cell2 cell3 ...
> sample1    28     26    30
> sample2    25     27    15
> sample3    30     40    45
> ...
> 
> 
> Based on which, one should be able to calculate fractions of cell cycle
> subpopulations (G1, S, G2+M). However, the data needs to be first normalized
> (scaling and peak alignment etc), before gating the cells into
> subpopulations.
> 
> The flowCore and related packages offer similar functions, but seem to be an
> overkill for a univariate readout. I wonder if there are other
> methods/packages available.
> 
> Thanks a lot in advance!
> 
> Xian
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

Greg Finak, PhD
Post-doctoral Research Associate
PS Statistics, Vaccine and Infectious Disease Division.
Fred Hutchinson Cancer Research Center
Seattle, WA
(206)667-3116
gfinak at fhcrc.org



More information about the Bioconductor mailing list