R/maxBestimation.R
fitNLMModels.Rd
The function will create sensible parameter ranges for A, k and p parameters
of the Chapman-Richards and Logistic growth curves and attempt to run a
forward step-AIC procedure to add covariates to the linear component of the
model (on the A parameter -- the asymptote). The maximum number of covariates
to add is determined by maxNoCoefs
.
fitNLMModels(
sp = NULL,
predictorVarsData,
sppVarsB,
predictorVars,
predictorVarsCombos = NULL,
maxNoCoefs = 4,
doFwdSelection = FALSE,
sampleSize = 3000,
Ntries = 2000,
maxCover = 1L,
models = c("CR", "Logistic"),
modelOutputsPrev = NULL,
randomStarts = FALSE,
lowerBounds = TRUE,
upperBounds = TRUE,
nbWorkers = 1L
)
species name -- only used for messaging.
a data.table
of predictor variables including
those in predictorVars
and age
, as well as pixelIndex
.
Note that age
should be in the original scale (e.g., not logged).
s data.table
of species biomass (B
) and pixelIndex
.
character vector of predictor variables to be included
in the linear component of the model affecting the asymptote (need to correspond to
names(predictorVarsData[[sp]])
) the same predictors will be considered for all
species.
a list of sets of covariates in predictorVars
to add to
the fitted models. If this list has several entries with sets of covariates,
each will be fitted as part of the model selection process.
how many covariates from predictorVars
should be added to
the linear component of the model affecting the asymptote? Note that the
more covariates are added the longer the model takes to fit, as all
combinations are attempted. For 2 or more covariates, only combinations with
"cover" are attempted.
should covariates be added one at a time to the
linear component of the model? If TRUE
, and is.null(predictorVarsCombos)
,
then each entry in predictorVarsCombos
is used as the set of covariates to
test. Otherwise, predictorVarsCombos
will be created from combinations of
predictorVars
, with maxNoCoefs
determining the maximum number of covariates
to add. If FALSE
the full model is fitted.
how many data points should be randomly sampled to fit the model?
If NA
the full dataset will be used. Note that this may result in long
computation times. Biomass data will be binned into 10 regular bins before sampling points in number
equal to sampleSize
.
how many times should the models be fit with new randomly
generated starting values? Only used if randomStarts == TRUE
.
numeric. Value indicating maximum cover/dominance.
character vector of models to fit. Only Chapman-Richards ('CR') and 'Logistic' can be chosen at the moment.
previous outputs of fitNLMmodels
. The model will try
refitting and comparing AIC with the last results.
logical. Should random starting values of A, k and p non-linear parameters be picked from a range sensible values, or should all combinations of values within this range be used? If FALSE, the default, the starting values are spaced at regular intervals within an acceptable range for each parameter -- 20 values for A, 10 for k and p -- and all combinations are used (2000 starting values in total). Parameter ranges are estimated from data following Fekedulegn et al. (1999) as follows:
range of A
starting values (B0 parameter in Fekedulegn et al. 1999)
varies between \(ObsMaxB \times 0.3\) and \(ObsMaxB \times 0.9\), where $$ObsMaxB$$ is
the maximum observed B across the full dataset (not the sampled data for fitting)
k
(CR model) and p
(Logistic model; both are B2
parameter in
Fekedulegn et al. 1999) are estimated as a constant rate to get to $$ObsMaxB$$,
calculated as \(\frac{\frac{Bobs2 - Bobs1}{age2 - age1}}{ObsMaxB}\),
where B1
/B2
and age1
/age2
are are observed values at two points in time.
We draw 100 samples of two age
values, and corresponding B
,
to calculate a sample of rates.
After excluding rates <= 0, we take the minimum and maximum values as
the range of k
(CR model) or p
(Logistic model) parameters of the growth model
the p
parameter (CR model) (related to B3
parameter in Fekedulegn
et al. 1999) should be > 1. Here we use a range of values between 1.1 and
80 which provided suitable fitting using data from the Northwest Territories,
Canada
the k
parameter (Logistic model; B1 parameter in Fekedulegn et al. 1999)
is estimated as \(B_0 = \frac{ObsMaxB}{1 + k}\), using a small positive
number for \(B_0\), e.g. 2. Here, we estimate k
values for \(B_0\)
values 1 to 5, and use the minimum and maximum to determine the range from
where to draw starting values.
a named vector of lower parameter boundaries. If FALSE
, no lower
boundaries are applied. If TRUE
, coefficients of the linear model on the A
parameter (intercept
, cover
, k
and p
) are bound
(intercept = observed maximum B * 0.5
, cover = 0
, k = 0.05
and p = 1
).
Alternatively, pass a named vector of parameter boundaries.
a named vector of upper parameter boundaries. If FALSE
, no lower
boundaries are applied. If TRUE
, coefficient of the linear model on the A parameter
(intercept
and k
) are bound (intercept = observed maximum B * 1.5
, k = 0.2
).
Alternatively, pass a named vector of parameter boundaries.
integer. If > 1, the number of workers to use in future.apply::future_apply
, otherwise
no parallellisation is done.