Subset a data.table with random subsampling within by groups

Subset a data.table with random subsampling within by groups

subsetDT(DT, by, doSubset = TRUE, indices = FALSE)

Arguments

DT: A data.table
by: Character vector of column names to use for groups
doSubset: Logical or numeric indicating the number of subsamples to use
indices: Logical. If TRUE, this will return vector of row indices only. Defaults to FALSE, i.e., return the subsampled data.table

Examples

library(data.table)
dt <- data.table(Lett = sample(LETTERS, replace = TRUE, size = 1000), Nums = 1:100)
dt1 <- subsetDT(dt, by = "Lett", doSubset = 3)
#> subsampling initial dataset for faster model estimation: using maximum of 3 samples per combination of ecoregionGroup and speciesCode. Change 'doSubset' to a different number if this is not enough