Subset a data.table with random subsampling within by groups
subsetDT(DT, by, doSubset = TRUE, indices = FALSE)A data.table
Character vector of column names to use for groups
Logical or numeric indicating the number of subsamples to use
Logical. If TRUE, this will return vector of row indices only. Defaults
to FALSE, i.e., return the subsampled data.table
library(data.table)
dt <- data.table(Lett = sample(LETTERS, replace = TRUE, size = 1000), Nums = 1:100)
dt1 <- subsetDT(dt, by = "Lett", doSubset = 3)
#> subsampling initial dataset for faster model estimation: using maximum of 3 samples per combination of ecoregionGroup and speciesCode. Change 'doSubset' to a different number if this is not enough