Subset a data.table with random subsampling within by groups

subsetDT(DT, by, doSubset = TRUE, indices = FALSE)

Arguments

DT

A data.table

by

Character vector of column names to use for groups

doSubset

Logical or numeric indicating the number of subsamples to use

indices

Logical. If TRUE, this will return vector of row indices only. Defaults to FALSE, i.e., return the subsampled data.table

Examples

library(data.table)
dt <- data.table(Lett = sample(LETTERS, replace = TRUE, size = 1000), Nums = 1:100)
dt1 <- subsetDT(dt, by = "Lett", doSubset = 3)
#> subsampling initial dataset for faster model estimation: using maximum of 3 samples per combination of ecoregionGroup and speciesCode. Change 'doSubset' to a different number if this is not enough