[R] Efficient way to subset rows in R for dataset with 10^7 columns

Jack Arnestad j@ck@rne@t@d @ending from gm@il@com
Sat Apr 14 02:31:32 CEST 2018


I have a data.table with dimensions 100 by 10^7.

When I do

    trainIndex <-
      caret::createDataPartition(
        df$status,
        p = .9,
        list = FALSE,
        times = 1
      )
    outerTrain <- df[trainIndex]
    outerTest  <- df[-trainIndex]

Subsetting the rows of df takes over 20 minutes.

What is the best way to efficiently subset this?

Thanks!

	[[alternative HTML version deleted]]




More information about the R-help mailing list