Subset a data.table
with random subsampling within by
groups
subsetDT(DT, by, doSubset = TRUE, indices = FALSE)
A data.table
Character vector of column names to use for groups
Logical or numeric indicating the number of subsamples to use
Logical. If TRUE
, this will return vector of row indices only. Defaults
to FALSE
, i.e., return the subsampled data.table
library(data.table)
dt <- data.table(Lett = sample(LETTERS, replace = TRUE, size = 1000), Nums = 1:100)
dt1 <- subsetDT(dt, by = "Lett", doSubset = 3)
#> subsampling initial dataset for faster model estimation: using maximum of 3 samples per combination of ecoregionGroup and speciesCode. Change 'doSubset' to a different number if this is not enough