Description of Model Equations
c_model_equations.Rmd
The Stochastic Population over Regional Components
(SPoRC
) model is a generalized integrated population model
written in R-TMB
(R bindings for Template Model Builder;
Kristensen et al., 2016) that supports age, sex, and
spatially-structured dynamics. Currently, SPoRC
assumes
that population dynamics operate following an annual time step, and in
sequential order:
- Recruitment and tag releases initially occur (tag releases only occur if tagging data are used),
- Markovian movement of individuals then follow (movement only occurs in the spatial model), and
- Total mortality, chronic tag loss (if applicable), and ageing processes occur.
These processes are modeled across four primary model partitions: region (), year (), age ( and , where is the plus group), and sex (). In single-region and/or single-sex models, these equations generally reduce by setting and/or . In general, the same equations are used for both simulation and estimation.
Process Equations
Population Initialization
In SPoRC
, three primary methods exist to initialize the
equilibrium population of the model. The first method derives the
equilibrium population using the following process:
where:
- are the equilibrium numbers-at-age,
- is a global recruitment parameter (either virgin or mean recruitment depending on the parameterization of stock recruitment dynamics),
- is the initial instantaneous total mortality rate,
- is the instantaneous natural mortality rate,
- is the mean fishing mortality rate from the first fleet ( denotes fishery fleet),
- is a user-defined value describing the proportion of the mean fishing mortality from the first fleet applied during the initialization stage,
- is the fishery selectivity-at-age for the first fleet,
- describes the recruitment sex-ratio,
- apportions the global recruitment parameter across regions (estimated using a multinomial logit transform to ensure proportions sum to one).
The plus group () of the initial population is then computed as:
However, this scalar geometric series solution assumes that the plus group accumulates in a closed system. Therefore, when movement dynamics are present, this solution does not correctly accumulate individuals into the plus group.
To address this, additional methods are provided to explicitly incorporate movement dynamics into the plus group calculation. In particular, the initial population can be derived by iterating the population to equilibrium. Here, the total instantaneous mortality in each region is defined similarly as in the first method. An exponential decay model is used to first initialize the age structure at the first iteration:
The initialized age structure is then iterated forward to equilibrium by applying recruitment, movement, and mortality and ageing processes in order (see Population Projection section for equations).
While the iterative method correctly accumulates the plus group when movement is present, it can be computationally inefficient due to the potentially large number of iterations required to propagate the initial population. Therefore, SPoRC enables users to compute the plus group using the matrix formulation of the geometric series, which correctly accounts for movement processes. Individuals are first initialized through recruitment processes, after which movement, mortality, and ageing processes are applied sequentially (up to the number of ages in the model). The population (or cohort) is then projected forward to the penultimate age (), which is used to derive the plus group. Specifically, the penultimate age is projected forward once more:
where is a first-order Markov matrix representing movement, is the Hadamard product (elementwise multiplication), and represents the culmination of processes applied to the penultimate age. A transition matrix ) is the constructed to represent the combined effects of survival and movement on the plus group:
The plus group solution incorporating movement is then given by:
When only a single region is modeled or no movement occurs (i.e., an identity matrix), the matrix formulation simplifies to the standard scalar geometric series solution.
Following the definition of equilibrium age structure, initial age deviations can be applied:
where are represents the numbers-at-age in the first model year except recruits (). These values can then be treated as a stochastic process by applying multiplicative lognormal deviations to the initial equilibrium age structure. Note that the index is introduced because users can determine whether initial age deviations are estimated up to the penultimate age class, or across all classes including the plus group.
Recruitment Processes
In the current iteration of SPoRC
, two stock recruitment
parameterizations can be specified. Recruitment can be specified to
arise about a mean parameter
():
where are annual, lognormally distributed recruitment deviations with a lognormal bias correction term (), with representing the bias correction ramp from Methot and Taylor (2011). Recruitment can also be specified to arise from a Beverton-Holt stock recruitment function to invoke density-dependent population dynamics, which follows the steepness parameterization (Mace and Doonan, 1988). Localized density-dependent recruitment is defined as:
while global density-dependent recruitment can be defined as:
where under this parameterization is the virgin unfished recruitment, (or ) is the steepness parameter representing the fraction of that would be produced when at 20% of (or ). The steepness parameter is constrained to be between values of 0.2 and 1 and are estimated in bounded logit space. is a derived variable that represents the unfished spawning stock biomass (i.e., derived by multiplying the unfished SSB per recruit by ). is the spawning stock biomass, which is the product of numbers-at-age, spawning weight-at-age, and maturity-at-age for females:
Note that
denotes the delay between spawning and when recruits enter the
population. Thus, if
,
SPoRC
utilizes
instead of
to compute deterministic recruitment.
Population Projection
Following recruitment processes, the population can then be projected forward. In the context of the spatial model, Markovian movement dynamics are first applied:
Here, is a first-order Markov matrix representing movement. In a single-region case, no movement is applied (i.e., is an implied identity matrix). For each year, age, and sex combination, the movement matrix specifies bulk-transfer coefficients, with parameters transformed through a multinomial logit to ensure that proportions within each origin region sum to one. Following movement processes, the population undergoes mortality and ageing:
denotes the annual total instantaneous mortality rate and is defined as the combination of natural mortality () and fishing mortality ():
The annual instantaneous fishing mortality rate is then defined as:
where is parameterized based on lognormal deviations ( about a mean fishing mortality parameter for a given fishery fleet ().
Observation Equations
Fishery Observation Model
The fishery observation model describes the expected catch-at-age, catch-at-length, catch (in units of biomass), and fishery indices. Expected catch-at-age () for a given fishery fleet is calculated using Baranov’s catch equation:
To track length-based dynamics, catch-at-length () can then be derived using the following equation:
where is a user-defined size-age transition matrix to convert ages to lengths. Expected catch () is then computed summation of the expected catch-at-age across ages and sexes, multiplied by their respective fishery weights-at-age ():
The expected fishery index () is computed as:
where is the catchability coefficient for a given fishery fleet.
Survey Observation Model
Likewise, the survey observation model describes the expected survey catch-at-age, survey catch-at-length, and survey indices. Expected survey catch-at-age () is calculated as follows:
where subscript denotes a given survey fleet and is the survey selectivity-at-age pattern. Expected survey catch-at-length () is given by:
Survey indices () can be computed as either abundance-based or biomass-based. Abundance-based survey indices are calculated as:
while biomass-based indices are computed as:
Here, is the weight-at-age for a given survey, represents the survey catchability coefficient in the aforementioned equations.
For survey catchability, environmental linkage can be specified:
where is the base survey catchability (i.e., intercept), is a matrix of covariates, is a vector of regression coefficients, are orthogonal polynomial coefficients along with its basis functions, and are the covariates for which a polynomial term is assumed for.
Tagging Observation Model
The tagging observation model tracks tag cohorts () by the combination of release region and release year () and follows a Brownie tag attrition framework. Tag cohorts are tracked for a pre-defined maximum duration (maximum tag liberty; ), after which calculations for the tag cohort are no longer computed. This is done to aid with computation time. Additionally, tag cohorts also have the option to undergo a fraction of mortality in the year of release, such that if a given cohort is released during the middle of the year, mortality processes are discounted. If mortality discounting is specified, movement does not occur in the year of release. In general, the process dynamics for the tagged cohort mimics those specified for the overall population. Immediately following release, tag cohorts are decremented by an initial tag-induced mortality rate:
where is the initial tag-induced mortality rate. If tagged cohorts are released at the beginning of a calendar year, Markovian movement occurs (movement does not occur otherwise):
Mortality and ageing of the tagged cohort then occurs:
where tagged cohorts follow an exponential mortality model, with accumulation of individuals in the plus-group. Total mortality for the tagged cohort () is specified as:
where is a parameter describing chronic tag loss (i.e., annual tag shedding). Similar to computations for catch-at-age, tag recaptures are calculated using a modified version of Baranov’s catch equation:
represents a tag reporting rate parameter that can vary across space and time, which is estimated in logit space such that it is constrained between .
Fishery and Survey Selectivity
In the following descriptions of selectivity, we omit subscripts for sexes and fleets for brevity, although note that the equations remain specific to those model partitions. Several approaches are available for parameterizing fishery and survey selectivity. Selectivity can be defined as either age- or length-based. For age-based selectivity, an age vector is applied directly with a chosen functional form. For length-based selectivity, a length vector is used to compute selectivity, which is then combined with a size–age transition matrix through a dot product to obtain selectivity-at-age.
Given the age-based nature of the model, selectivity-at-age is utilized for all subsequent calculations. Various functional forms can also be specified for selectivity processes. In the following we use the subscript to denote a generalized bin number.
Two forms of logistic selectivity can be specified. The first form is defined as:
where determines the slope/steepness of the logistic curve and is the bin-at-50% selection. The second form can be expressed as:
Here,
is also the bin-at-50% selection and
is the bin-at-95% selection. Beyond the specification of flat-topped
selectivity, SPoRC
also allows for dome-shaped selectivity.
In particular, a reparametrized gamma function can be specified:
In this parameterization, is a derived power parameter, is the shape parameter that describes the steepness of the descending limb, and describes the bin-at-maximum selection. Dome-shaped selectivity can also be expressed as a power function:
with being a power parameter that determines the descending limb of the curve (larger values are steeper). The last dome-shaped selectivity form that can be specified includes a 6 parameter (denoted as through ) double normal functional form with the following transformations applied:
represents the beginning of the plateau, describes the width of the plateau, and are parameters controlling the ascending and descending widths, and and control the selectivity at the first and last bins. The double normal function is then defined with a series of functions:
and are joined together as:
The double normal functional form is incredibly flexible and is able to reduce to both flat-topped and dome-shaped selectivity forms, depending on the values of the parameters.
In addition to the functional forms that can be specified to describe selectivity processes, several options exist to specify continuous time-varying processes. In particular, options to specify time-varying parametric selectivity and time-varying semi-parametric selectivity are available. To illustrate, if logistic selectivity is specified and parametric deviations are invoked, the following expression is used:
where the parameters of the logistic form are allowed to vary over time. In the context of semi-parametric selectivity, the following equation is used:
where deviations are placed about the parametric form and selectivity values are mean standardized to aid with interpretability. Importantly, note that if age-based selectivity is specified and continuous deviations are estimated, these deviations are placed on the ages themselves. In the case where length-based selectivity is specified, these deviations are placed on the length bins. Further details on how selectivity deviations arise can be found in the “Selectivity Process Error” section of this document.
Likelihoods
Currently, SPoRC
incorporates data likelihood components
for the following data sources:
- fishery catches,
- fishery indices,
- fishery age compositions,
- fishery length compositions,
- survey indices,
- survey age compositions,
- survey length compositions, and
- tagging data.
The total likelihood (objective function) is the sum of the individual likelihood contributions from these data sources along with priors and penalties, where the objective function is minimized using a non-linear optimization algorithm to estimate model parameters.
Observation Likelihoods
Fishery Catches
Fishery catches can be fit using a lognormal likelihood. The log-likelihood for observed catch, is defined as:
Here, is the likelihood weight, is the observed catch, is the predicted catch, and is the variance of catch on the log scale.
Fishery and Survey Indices
Fishery indices can also be fit assuming a lognormal likelihood. The log-likelihood for observed fishery indices is:
where controls the weight of fishery indices to the objective function, represents the observed fishery indices, and denotes the variance of the fishery index.
Likewise, survey indices can be fit assuming a lognormal likelihood given by:
is the likelihood weight applied to survey indices, are the observed survey indices, and indicates the variance of the survey index.
Fishery and Survey Compositions
Several options for fitting composition data are available in
SPoRC
. These include the multinomial, the
Dirichlet-multinomial, and the logistic-normal likelihoods. In the case
of the multinomial likelihood, the following expression is used:
where subscript is used to indicate a fishery or survey fleet and the subscript generically indicates a bin number. are likelihood weights applied to composition data (can be derived using iterative weighting methods), is the input sample size, denotes the expected composition proportions, and are the observed composition proportions.
If a Dirichlet-multinomial likelihood is assumed, the following parameterization (linear) is used:
Here, is the overdispersion parameter of the Dirichlet-multinomial that adjusts the input sample size. The effective sample size ( can then be derived as:
Therefore, if was large such that , then . By contrast, if was estimated at a small value such that , then approximates the ratio of and (Thorson et al., 2017).
A multivariate logistic-normal likelihood can also be assumed, which is given by:
Both and are dimensional vectors, while is a covariance matrix (see below for further details). and are derived via an additive logistic function:
where and are transformed proportions using the last bin as the reference category. Because the logarithm of zero is undefined, all untransformed proportions must be strictly positive. If any observed proportion is zero, both the observed and corresponding expected values are removed, and the remaining proportions are renormalized to ensure that they sum to one before applying the transformation. The covariance matrix of the logistic-normal likelihood can be specified in various ways. In the simplest case, the covariance matrix can be assumed to be independent and identically distributed (iid):
where is a identity matrix and is an estimated overdispersion parameter representing the variance. The simple iid case can be further extended to incorporate a one-dimensional lag-1 autoregressive structure:
Here, is a correlation matrix with a lag-1 autoregressive structure, where defines the correlation across bins. Lastly, if the model is specified to be sex-structured and sex-composition data are utilized, a two-dimensional autoregressive structure can be specified:
is a constant correlation matrix dimensioned by for sexes, with off-diagonal elements controlling the correlation of age/length categories across sexes, while is a lag-1 autoregressive correlation structure, where defines the correlation across age/length categories. denotes the Kronecker product. This structure implies that similar age or size categories across sexes are correlated, while ensuring that very different age or size categories remain uncorrelated.
Structuring Compositions and Ageing Error
Related to the use of composition data likelihoods, composition data can be structured differently depending on model assumptions and data constraints. In particular, three options are available to fit to composition data:
‘Aggregated’ compositions across regions and sexes,
‘Split’ compositions for each region and sex (i.e., no implicit information about sex-ratios), and
‘Joint’ compositions across sexes (i.e., implicit information is provided about sex-ratios).
The expected compositions (i.e., catch-at-age, catch-at-length, survey catch-at-age, survey catch-at-length) when specified as ‘aggregated’ are derived with the following:
where compositions are summed across regions and sexes and normalized to sum to one. Ageing error () can then be applied using standard matrix multiplication. Expected compositions that are specified as ‘Split’ by sexes and regions are computed as:
Here, expected compositions sum to one within a given region and sex combination and ageing error is similarly applied via matrix multiplication. In the case where expected compositions are specified as ‘Joint’, they are calculated as:
where the expected compositions sum to one jointly across bins and sexes, thus preserving implicit sex-ratio information. Ageing error is then applied by taking the Kronecker product of a identity matrix with the ageing error matrix, followed by matrix multiplication.
Tagging
SPoRC
currently allows for various tagging likelihoods,
ranging from the Poisson, Negative Binomial, multinomial, and
Dirichlet-multinomial likelihood. Additionally, SPoRC
also
allows for both release- and recapture-conditioned dynamics (McGarvey
and Feenstra, 2002). The Poisson tag likelihood is given by:
where are the observed tag recaptures and is the likelihood weight applied to tagging data. In the case where the Negative Binomial is invoked, the following expression is used:
Here, represents the estimated overdispersion parameter for tagging data.
Under release conditioned dynamics, both recaptured and non-recaptured states are fit to. Proportions of observed () and expected recaptured (individuals are given by:
denotes the total tags released for a given tag cohort (combination of release region and year). Non-recaptured states can then be written as:
where and are the observed and expected non-recaptured states, respectively. These states are then combined into a single vector of observed and expected values:
If a Multinomial likelihood is assumed for release conditioned dynamics, this is given by:
Here, the subscript is used to generically denote a given element. If a Dirichlet-multinomial with released-condition dynamics was assumed, the tagging likelihood would be written as:
The parameter in the Dirichlet-multinomial likelihood represents the overdispersion parameter for tagging data,
Under recapture-conditioned dynamics, tag shedding, tag induced mortality, and tag reporting rates are assumed to be spatially-invariant and do not need to be estimated, given that these terms cancel out in the denominator (McGarvey and Feenstra, 2002). Unlike release-conditioned dynamics, assuming recaptured-conditioned processes does not require fitting to non-recaptured states. Thus, the observed and expected recaptured proportions can be written as:
where recapture probabilities are normalized by the total number of recaptures in a given year ().
Parameter Priors and Process Error Penalties
Parameter Priors
Considering the complexity of integrated population models, several priors can be specified to help inform the estimation of parameters by providing additional knowledge. Beyond that, priors can also be specified to aid in model stabilization (i.e., regularizing priors). Priors can currently be specified for natural mortality, fishery and survey catchability, fishery and survey selectivity, steepness, recruitment proportions, movement rates, and tag reporting rates.
Natural Mortality
In the case of natural mortality, a lognormal prior is utilized:
where the variance of the prior is given by , and denotes the prior mean.
Fishery and Survey Catchability
For fishery and survey catchability, a lognormal prior can also be specified:
where here represents either a fishery or survey fleet, is the variance of the prior, and indicates the prior mean for catchability.
Fishery and Survey Selectivity
In general, selectivity priors can be utilized to serve as regularizing priors to facilitate stable parameter estimation (Monnahan, 2024). These priors are assumed to be lognormal and are applied to the selectivity parameters themselves:
where is a selectivity parameter for a given functional form specified, is the prior variance, and is the prior mean for the specific selectivity parameter.
Steepness
If a Beverton-Holt stock recruitment relationship is assumed, priors for steepness can be specified. Currently, a scaled beta prior (bounded between 0.2 and 1) can be invoked:
Here, and are parameters of the beta distribution, is the prior mean steepness for a given region (bounded between 0.2 and 1) while is the standard deviation for these priors.
Recruitment Proportions
Regional recruitment is derived by apportioning a global recruitment parameter using regional recruitment proportions (i.e., ). Here, is derived via a multinomial logit transformation and Dirichlet priors can be used to help constrain estimation:
are the estimated recruitment proportions across regions, and is the concentration parameter governing the spread of the Dirichlet distribution.
Movement
Likewise, priors on movement values can be assumed to arise from a Dirichlet process:
where , is the origin region, is the destination, and are the concentration parameters that control the Dirichlet distribution.
Tag Reporting Rates
Two types of priors can be specified for tag reporting rates. In particular, a symmetric beta distribution is applied:
Here, determines the scale of the tag reporting parameter and determines how strongly to penalize estimates when they approach the bounds of . Smaller values of result in larger penalties, and vice versa. Thus, estimates of tag reporting rates are penalized if they are either close to 0 or 1.
Tag reporting rate priors can also be specified as a standard beta distribution:
Here, and are the parameters describing the beta distribution, denotes the prior mean for tag reporting rates and is the prior standard deviation.
Process Error Penalties
In addition to priors, penalties are also utilized to aid in the estimation of process errors (either penalized likelihood or integrating random effects via Laplace Approximation are possible). Currently, process errors can be specified to arise for initial age deviations, recruitment, fishing mortality, fishery and survey selectivity, and movement.
Initial Age Deviations
To estimate non-equilibrium initial age deviations, multiplicative deviations can be specified:
where deviations arise from a normal distribution with a mean of 0 and variance of .
Recruitment Deviations
Annual recruitment deviations can also be specified, where multiplicative deviations are assumed:
and deviations are assumed to normally distributed, with a mean of 0 and variance of .
Fishing Mortality Deviations
Fishing mortality deviations similarly assumes multiplicative deviations:
Here, fishing mortality deviations are assumed to arise from a normal distribution, with mean 0 and a variance of .
Fishery and Survey Selectivity
A variety of process error parameterizations can be specified for fishery and survey selectivity. Across all parameterizations, multiplicative deviations are assumed. In the most basic case, iid deviations can be assumed to vary about a parameter on a given selectivity functional form:
where are selectivity deviations about a given parameter for region , year , parameter , sex , and fleet . Deviations are assumed to have a mean of 0 and a variance of , constrained by a normal distribution.
Extending the iid case, random walk selectivity deviations can also be specified about a given parameter, assuming a normal distribution:
where process error deviations for the first year are initialized with a large variance. Following the first year, process error deviations follow a random walk process with a mean conditional on the previous year’s value () and a variance of .
In addition to being constrained by a normal distribution, both iid and random walk cases have an optional additional smoothness penalty applied:
which serves to ensure selectivity values do not vary drastically from year-to-year.
Additionally, semi-parametric deviations can also be specified. In total, there are three options that can be utilized, two of which allow age, year, and cohort correlations, while one allows for only age or length and year correlations. In the case where age, year, and cohort correlations are specified (note that this is only possible when age-based selectivity is specified), marginal stationary variance and a conditional non-stationary variance can be invoked. The primary difference between the two is that the marginal variance version does not have a closed form solution for (see equations below) and needs to be iteratively solved, while the conditional variance version has a closed form solution. The following equations describe the conditional variance version:
where we vectorize the selectivity deviations across its year and age dimensions. These deviations are then assumed to arise from a multivariate normal distribution (or Gaussian Markov Random Field) with a covariance matrix () determined by:
Here, is an identity matrix and is a diagonal matrix that determines the variance of the multivariate normal process. is a square matrix representing the partial effect of on preceding ages and/or years, governed by partial correlation coefficients for ages, years, and cohorts. To demonstrate the formulation of , a simplified example is provided with rows representing ages and columns representing years . In this example, is a matrix, where both the rows and columns represent combinations of age and year. For instance, captures the correlation within the same year between age 1 in year 1, and age 2 in year 1, whereas constructs the correlation within the same cohort between age 1 in year 1, and age 2 in year 2:
where , , and are parameters describing the partial autocorrelation among years within a given age, among ages within a given year, and years within a cohort, respectively. The multivariate likelihood is then defined as:
If age or length and year correlations are specified (i.e., a two-dimensional autoregressive structure), a multivariate normal likelihood is similarly assumed (i.e., the equation described above), but the covariance structured of this process is defined as:
where and are correlation coefficients across years and bins, respectively. is a matrix representing a first-order autoregressive process across years and is a matrix representing a first-order autoregressive process across bins.
Moreover, when semi-parametric deviations are specified, additional optional penalties can be applied across bins and years to enforce curvature control:
Movement
Process error deviations can also be specified for movement parameters. Currently, only iid deviations are allowed, where deviations are placed upon fixed-effects movement parameters in logit space:
where are movement parameters in logit space for origin , destination , year , age , and sex , and are the corresponding fixed effects (the mean logits). Here, are movement deviations specified in logit space. Deviations then arise from a normal distribution with process variation constrained by :
Thus, process variation can be distinct for a given origin region, age, and sex. Because logits are parameterized with a reference category, only logits (and their deviations) are estimated per origin , ensuring the sum-to-one constraint on destination probabilities.
References
Kristensen, K., Nielsen, A., Berg, C.W., Skaug, H., Bell, B., 2016. TMB: Automatic Differentiation and Laplace Approximation. J. Stat. Soft. 70. https://doi.org/10.18637/jss.v070.i05
Mace, P.M., Doonan, I.J., 1988. A Generalised Bioeconomic Simulation Model for Fish Population Dynamics. MAFFish, N.Z. Ministry of Agriculture and Fisheries.
McGarvey, R., Feenstra, J.E., 2002. Estimating rates of fish movement from tag recoveries: conditioning by recapture. Can. J. Fish. Aquat. Sci. 59, 1054–1064. https://doi.org/10.1139/f02-080
Methot, R.D., Taylor, I.G., 2011. Adjusting for bias due to variability of estimated recruitments in fishery assessment models. Can. J. Fish. Aquat. Sci. 68, 1744–1760. https://doi.org/10.1139/f2011-092
Monnahan, C.C., 2024. Toward good practices for Bayesian data-rich fisheries stock assessments using a modern statistical workflow. Fisheries Research 275, 107024. https://doi.org/10.1016/j.fishres.2024.107024
Thorson, J.T., Johnson, K.F., Methot, R.D., Taylor, I.G., 2017. Model-based estimates of effective sample size in stock assessment models using the Dirichlet-multinomial distribution. Fisheries Research 192, 84–93. https://doi.org/10.1016/j.fishres.2016.06.005