[BioC] BHC appears to be broken

Dan Tenenbaum dtenenba at fhcrc.org
Sat Apr 27 19:41:08 CEST 2013


On Fri, Apr 26, 2013 at 3:00 PM, Joseph Viviano <vivianoj at yorku.ca> wrote:
> Hello, my apologies for the sloppy post.
>
> You can find a sample dataset here: https://www.dropbox.com/sh/p1od9e4vx8ky66a/igt2OkNDbQ
>
> And the code I ran was essentially:
>
> data       <- read.csv("subsample.csv",header=FALSE)
> itemLabels <- t(read.csv("labels.csv", header=FALSE)) #read in and transpose
> timePoints <- 1:24                                    #number of timepoints
> BHC_OUT    <- bhc(data,itemLabels,timePoints,"time-course",verbose=TRUE,numThreads=8)
>
> This is where it completely locks up. Also, note that I get the same result with multiple permutations of the bhc command, and that this occurs on multiple versions of R for me (including the latest releases).
>

Thanks. It does appear to use increasing amounts of CPU and memory.
I'm cc'ing the BHC maintainer.
Dan


> I should note that I have demeaned and variance normalized all time series before entering them into bhc, if that makes a difference.
>
> Cheers, Joseph
>
> On Wed, Apr 24, 2013 at 3:56 PM, Joseph Viviano<vivianoj at yorku.ca>  wrote:
>
>> <mailto:bioconductor at r-project.org>Hello all,
>>
>> I am having a great deal of trouble getting BHC to run on non-trivial
>> datasets. I am using the following commands:
>>
>> data           <- read.csv("data.csv")
>
> Can you share this dataset, or at least enough of it to reproduce the problem?
>
>> itemLabels <- names(data)
>> timePoints <- 1:24 # for the time-course case
>>
>> nDataItems <- nrow(data) # this equals 152000, approximately
>> nFeatures  <- ncol(data)   # this equals 24
>>
>> BHC_OUT <- bhc(data,itemLabels,timePoints"time-course",verbose=TRUE)
>
> This line produces a syntax error.
>
> In order to help you we need a fully reproducible example. Also,
> please send the output of the sessionInfo() command.
>
> Dan
>
>
>> ---
>>
>> This causes R to immediately lock up on windows 7, linux mint 13, and
>> OSX 10.6.8. The input data are variance normalized time-series exported
>> from MATLAB.
>>
>> Here is a sample timeseries from the .csv:
>>
>> -1.7858,-0.26742,0.37038,-0.87986,-0.55435,-0.89642,-1.2815,-0.62659,-0.98028,-1.0542,-1.0058,0.51103,0.90252,2.5272,-0.3048,0.81275,0.22414,0.15235,-0.20437,0.2545,0.95103,1.4214,0.82618,0.77179
>>
>> Any help would be greatly appreciated.
>>
>> Cheers, Joseph
>>
>>
>>          [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list