[R] Segmentation Fault with large dataframes and packages using rJava

Sebastian Salentin sebastian.salentin at biotec.tu-dresden.de
Thu May 26 11:49:09 CEST 2016


Dear all,

I have been trying to perform machine learning/feature selection tasks 
in R using various packages (e.g. mlr and FSelector).
However, when giving larger data frames as input for the functions, I 
get a segmentation fault (memory not mapped).

This happened first when using the mlr benchmark function with 
dataframes in the order of 200 rows x 10,000 columns (all integer values).

I prepared a minimal working example where I get a segmentation fault 
trying to calculate the information gain with FSelector package.

require("FSelector")
# Random dataframe 200 rows * 25,000 cols
large.df <- data.frame(replicate(25000,sample(0:1,200,rep=TRUE)))
weights <- information.gain(X24978~., large.df)
print(weights)


I am using R version 3.3.0 64-bit on Ubuntu 14.04.4 LTS with FSelector 
v0.20 and rJava v0.9.8 on a machine with 32 core Intel i7 and 250 GB 
Ram. Java is OpenJDK 1.7 74bit.

I would highly appreciate if you could give me any hint on how to solve 
the problem.

Best
ssalentin

-- 
Sebastian Salentin, PhD student
Bioinformatics Group

Technische Universität Dresden
Biotechnology Center (BIOTEC)
Tatzberg 47/49
01307 Dresden, Germany



More information about the R-help mailing list