Title: | Generalised Additive Extreme Value Models |
---|---|
Description: | Methods for fitting various extreme value distributions with parameters of generalised additive model (GAM) form are provided. For details of distributions see Coles, S.G. (2001) <doi:10.1007/978-1-4471-3675-0>, GAMs see Wood, S.N. (2017) <doi:10.1201/9781315370279>, and the fitting approach see Wood, S.N., Pya, N. & Safken, B. (2016) <doi:10.1080/01621459.2016.1180986>. Details of how evgam works and various examples are given in Youngman, B.D. (2022) <doi:10.18637/jss.v103.i03>. |
Authors: | Ben Youngman |
Maintainer: | Ben Youngman <[email protected]> |
License: | GPL-3 |
Version: | 1.0.1 |
Built: | 2025-01-30 17:25:28 UTC |
Source: | https://github.com/byoungman/evgam |
Conversion between location, scale and shape parameters of the generalised extreme value distribution and corresponding parameters of blended generalised extreme value distribution in both directions.
bgev2gev( location, scale, shape, pa = 0.05, pb = 0.2, alpha = 0.5, beta = 0.5, simplify = FALSE ) gev2bgev( location, scale, shape, pa = 0.05, pb = 0.2, alpha = 0.5, beta = 0.5, simplify = FALSE )
bgev2gev( location, scale, shape, pa = 0.05, pb = 0.2, alpha = 0.5, beta = 0.5, simplify = FALSE ) gev2bgev( location, scale, shape, pa = 0.05, pb = 0.2, alpha = 0.5, beta = 0.5, simplify = FALSE )
location , scale , shape
|
location, scale and shape parameters. |
pa , pb , alpha , beta
|
Gumbel to GEV mixing parameters; see Details. |
simplify |
logical; should |
The blended generalised extreme value distribution has location
parameter ,
scale
and
shape
parameters; see, e.g.,
dbgev
. The generalised
extreme value distribution's location and scale parameters in its typical
form are given by
and
.
So
bgev2gev
gives the mapping .
The reverse mapping,
,
is given by
gev2bgev
.
A list
or array
if simplify = TRUE
.
Castro-Camilo, D., Huser, R., & Rue, H. (2022). Practical strategies for generalized extreme value-based regression models for extremes. Environmetrics, 33(6), e2742. doi:10.1002/env.2742
bgev2gev(2, 1, .1) gev2bgev(2, 1, .1)
bgev2gev(2, 1, .1) gev2bgev(2, 1, .1)
Scatter plot, with variable-based point colours
colplot( x, y, z, n = 20, z.lim = NULL, breaks = NULL, palette = heat.colors, rev = TRUE, pch = 21, add = FALSE, ..., legend = FALSE, n.legend = 6, legend.pretty = TRUE, legend.plot = TRUE, legend.x, legend.y = NULL, legend.horiz = FALSE, legend.bg = par("bg") )
colplot( x, y, z, n = 20, z.lim = NULL, breaks = NULL, palette = heat.colors, rev = TRUE, pch = 21, add = FALSE, ..., legend = FALSE, n.legend = 6, legend.pretty = TRUE, legend.plot = TRUE, legend.x, legend.y = NULL, legend.horiz = FALSE, legend.bg = par("bg") )
x |
a vector of x coordinates |
y |
a vector of y coordinates |
z |
a variable for defining colours |
n |
an integer giving the number of colour levels, supplied to pretty |
z.lim |
xxx |
breaks |
a vector or breaks for defining color intervals; defaults to |
palette |
a function for the color palette, or colors between |
rev |
logical: should the palette be reversed? Defaults to |
pch |
an integer giving the plotting character, supplied to plot |
add |
should this be added to an existing plot? Defaults to |
... |
other arguments passed to plot |
legend |
should a legend be added? Defaults to codeFALSE |
n.legend |
an integer giving the approximate number of legend entries; defaults to 6 |
legend.pretty |
logical: should the legend values produced by \[base]pretty? Othewrwise they are exact. Defaults to |
legend.plot |
passed to legend's |
legend.x |
passed to legend's |
legend.y |
passed to legend's |
legend.horiz |
passed to legend's |
legend.bg |
passed to legend's |
A plot
x <- runif(50) y <- runif(50) colplot(x, y, x * y) colplot(x, y, x * y, legend=TRUE, legend.x="bottomleft") colplot(x, y, x * y, legend=TRUE, legend.pretty=FALSE, n.legend=10, legend.x="bottomleft", legend.horiz=TRUE)
x <- runif(50) y <- runif(50) colplot(x, y, x * y) colplot(x, y, x * y, legend=TRUE, legend.x="bottomleft") colplot(x, y, x * y, legend=TRUE, legend.pretty=FALSE, n.legend=10, legend.x="bottomleft", legend.horiz=TRUE)
Conditional extreme value model negative log-likelihood
condexd0(pars, X1, X2, X3, X4, ymat, xmat, wmat, dupid, dcate, nhere) condexd12(pars, X1, X2, X3, X4, ymat, xmat, wmat, dupid, dcate, nhere) condexd34(pars, X1, X2, X3, X4, ymat, xmat, wmat, dupid, dcate, nhere)
condexd0(pars, X1, X2, X3, X4, ymat, xmat, wmat, dupid, dcate, nhere) condexd12(pars, X1, X2, X3, X4, ymat, xmat, wmat, dupid, dcate, nhere) condexd34(pars, X1, X2, X3, X4, ymat, xmat, wmat, dupid, dcate, nhere)
pars |
a list of vectors of coefficients for each bGEV parameter |
X1 |
a design matrix for (transformed) alpha |
X2 |
a design matrix for (transformed) beta |
X3 |
a design matrix for mu |
X4 |
a design matrix for (transformed) sigma |
ymat |
a matrix |
xmat |
a matrix |
wmat |
a matrix |
dupid |
a scalar or vector, identifying duplicates in Xs; -1 corresponds to no duplicates |
condexd0 a scalar, the negative log-likelihood
condexd12 a matrix, first then second derivatives w.r.t. parameters
condexd34 a matrix, third then fourth derivatives w.r.t. parameters
## to follow
## to follow
Three objects: 1) COprcp
, a 404,326-row
data frame with columns date
, prcp
and meta_row
;
2) COprcp_meta
, a 64-row data frame, with meta data for 64 stations.
3) COelev
, a list of elevation for the domain at 0.02 x 0.02
degree resolution. Precipitation amounts are only given for April
to October in the years 1990 - 2019. The domain has a longitude range
of [-106, -104] and a latitude range [37, 41]. These choices reflect
the analysis of Cooley et al. (2007).
data(COprcp) # loads all three objects
data(COprcp) # loads all three objects
A data frame with 2383452 rows and 8 variables
The variables are as follows:
date of observation
daily rainfall accumulation in mm
an identifier for the row in COprcp_meta; see ‘Examples’
longitude of station
latitude of station
elevation of station in metres
GHCDN identifier
Cooley, D., Nychka, D., & Naveau, P. (2007). Bayesian spatial modeling of extreme precipitation return levels. Journal of the American Statistical Association, 102(479), 824-840.
library(evgam) data(COprcp) brks <- pretty(COelev$z, 50) image(COelev, breaks=brks, col=rev(heat.colors(length(brks[-1])))) colplot(COprcp_meta$lon, COprcp_meta$lat, COprcp_meta$elev, breaks=brks, add=TRUE)
library(evgam) data(COprcp) brks <- pretty(COelev$z, 50) image(COelev, breaks=brks, col=rev(heat.colors(length(brks[-1])))) colplot(COprcp_meta$lon, COprcp_meta$lat, COprcp_meta$elev, breaks=brks, add=TRUE)
Users can supply code to fit an arbitrary distribution with evgam
using
family = "custom"
. See Details for more information and the example below,
which demonstrates fitting of the Gumbel distribution.
Users should supply a list to evgam
called custom.fns
, which
which comprises the following functions: d0
,
which evaluates the negative log-likelihood; coded120, which evaluates its
first and second derivatives; and, optionally, d340
, which evaluates its
third and fourth derivatives. The list may also contain q
, which evaluates
the custom distribution's quantile function, for use with predict, prob
= ...)
. The list may also contain unlink
, which gives functions to
reverse any link functions (unlink) used with linear predictors, so that
predict(..., type = "response")
works meaningfully, and unlink
may also have an attribute deriv
, which gives derivatives of unlink
functions, which allows predict(..., type = "response", std.err = TRUE)
to return meaningful values.
Trying to mimic the Gumbel example below for your chosen distribution is almost
certainly the easiest was to get family = "custom"
to do what you want.
Youngman, B. D. (2022). evgam: An R Package for Generalized Additive Extreme Value Modules. Journal of Statistical Software. To appear. doi:10.18637/jss.v103.i03
# Gumbel custom likelihood # negative log likelihood gumd0 <- function(pars, likdata) { # this is needed, and should be fine pars <- split(pars, likdata$idpars) # likdata$X should be set up by evgam mu <- likdata$X[[1]] %*% pars[[1]] lsigma <- likdata$X[[2]] %*% pars[[2]] y <- likdata$y y <- (y - mu) * exp(-lsigma) nllh <- sum(lsigma - y + exp(y)) return(nllh) } # first and second derivatives of neg. log lik. in order # d_mu, d_lsigma, d_{mu, mu}, d_{mu, lsigma}, d_{lsigma, lsigma} gumd12 <- function(pars, likdata) { # this is needed, and should be fine pars <- split(pars, likdata$idpars) # likdata$X should be set up by evgam mu <- likdata$X[[1]] %*% pars[[1]] lsigma <- likdata$X[[2]] %*% pars[[2]] y <- likdata$y out <- matrix(0, length(y), 5) ee1 <- exp(lsigma) ee2 <- y - mu ee3 <- ee2/ee1 ee4 <- exp(ee3) ee5 <- (ee3 + 1) * ee4 ee6 <- 1 - ee4 # first derivatives out[, 1] <- ee6/ee1 out[, 2] <- ee6 * ee2/ee1 + 1 # second derivatives out[, 3] <- ee4/ee1^2 out[, 4] <- -((1 - ee5)/ee1) out[, 5] <- (ee5 - 1) * ee2/ee1 return(out) } gum_fns <- list(d0 = gumd0, d120 = gumd12, d340 = NULL) unlink <- list(NULL, function(x) exp(x)) attr(unlink[[2]], "deriv") <- unlink[[2]] qgumbel <- function(p, location, scale) location - scale * log(-log(p)) gum_fns$q <- qgumbel gum_fns$unlink <- unlink data(COprcp) COprcp <- cbind(COprcp, COprcp_meta[COprcp$meta_row,]) COprcp$year <- format(COprcp$date, "%Y") COprcp_gev <- aggregate(prcp ~ year + meta_row, COprcp, max) COprcp_gev <- cbind(COprcp_gev, COprcp_meta[COprcp_gev$meta_row,]) inits <- sqrt(6 * var(COprcp_gev$prcp) / pi^2) inits <- c(mean(COprcp_gev$prcp) - inits[1] * 0.5772156649, log(inits[1])) fmla_gum <- list(location = prcp ~ s(lon, lat) + s(elev), logscale = ~ s(lon, lat)) m <- evgam(fmla_gum, data = COprcp_gev, family = 'custom', custom.fns = gum_fns, trace = 2, inits = inits) predict(m, COprcp_gev[1:10,]) predict(m, COprcp_gev[1:10,], type = 'response') predict(m, COprcp_gev[1:10,], prob = .99)
# Gumbel custom likelihood # negative log likelihood gumd0 <- function(pars, likdata) { # this is needed, and should be fine pars <- split(pars, likdata$idpars) # likdata$X should be set up by evgam mu <- likdata$X[[1]] %*% pars[[1]] lsigma <- likdata$X[[2]] %*% pars[[2]] y <- likdata$y y <- (y - mu) * exp(-lsigma) nllh <- sum(lsigma - y + exp(y)) return(nllh) } # first and second derivatives of neg. log lik. in order # d_mu, d_lsigma, d_{mu, mu}, d_{mu, lsigma}, d_{lsigma, lsigma} gumd12 <- function(pars, likdata) { # this is needed, and should be fine pars <- split(pars, likdata$idpars) # likdata$X should be set up by evgam mu <- likdata$X[[1]] %*% pars[[1]] lsigma <- likdata$X[[2]] %*% pars[[2]] y <- likdata$y out <- matrix(0, length(y), 5) ee1 <- exp(lsigma) ee2 <- y - mu ee3 <- ee2/ee1 ee4 <- exp(ee3) ee5 <- (ee3 + 1) * ee4 ee6 <- 1 - ee4 # first derivatives out[, 1] <- ee6/ee1 out[, 2] <- ee6 * ee2/ee1 + 1 # second derivatives out[, 3] <- ee4/ee1^2 out[, 4] <- -((1 - ee5)/ee1) out[, 5] <- (ee5 - 1) * ee2/ee1 return(out) } gum_fns <- list(d0 = gumd0, d120 = gumd12, d340 = NULL) unlink <- list(NULL, function(x) exp(x)) attr(unlink[[2]], "deriv") <- unlink[[2]] qgumbel <- function(p, location, scale) location - scale * log(-log(p)) gum_fns$q <- qgumbel gum_fns$unlink <- unlink data(COprcp) COprcp <- cbind(COprcp, COprcp_meta[COprcp$meta_row,]) COprcp$year <- format(COprcp$date, "%Y") COprcp_gev <- aggregate(prcp ~ year + meta_row, COprcp, max) COprcp_gev <- cbind(COprcp_gev, COprcp_meta[COprcp_gev$meta_row,]) inits <- sqrt(6 * var(COprcp_gev$prcp) / pi^2) inits <- c(mean(COprcp_gev$prcp) - inits[1] * 0.5772156649, log(inits[1])) fmla_gum <- list(location = prcp ~ s(lon, lat) + s(elev), logscale = ~ s(lon, lat)) m <- evgam(fmla_gum, data = COprcp_gev, family = 'custom', custom.fns = gum_fns, trace = 2, inits = inits) predict(m, COprcp_gev[1:10,]) predict(m, COprcp_gev[1:10,], type = 'response') predict(m, COprcp_gev[1:10,], prob = .99)
Density, distribution function, quantile function and random generation for
the blended generalised extreme value distribution with parameters
location
, scale
, shape
, pa
, pb
,
alpha
and beta
.
dbgev( x, location, scale, shape, pa = 0.05, pb = 0.2, alpha = 0.5, beta = 0.5, log = FALSE ) pbgev( q, location, scale, shape, pa = 0.05, pb = 0.2, alpha = 0.5, beta = 0.5, lower.tail = TRUE, log.p = FALSE ) qbgev( p, location, scale, shape, pa = 0.05, pb = 0.2, alpha = 0.5, beta = 0.5, lower.tail = TRUE, log.p = FALSE ) rbgev(n, location, scale, shape, pa = 0.05, pb = 0.2, alpha = 0.5, beta = 0.5)
dbgev( x, location, scale, shape, pa = 0.05, pb = 0.2, alpha = 0.5, beta = 0.5, log = FALSE ) pbgev( q, location, scale, shape, pa = 0.05, pb = 0.2, alpha = 0.5, beta = 0.5, lower.tail = TRUE, log.p = FALSE ) qbgev( p, location, scale, shape, pa = 0.05, pb = 0.2, alpha = 0.5, beta = 0.5, lower.tail = TRUE, log.p = FALSE ) rbgev(n, location, scale, shape, pa = 0.05, pb = 0.2, alpha = 0.5, beta = 0.5)
x , q
|
vector of quantiles. |
location , scale , shape
|
location, scale and shape parameters. |
pa , pb , alpha , beta
|
Gumbel to GEV mixing parameters; see Details. |
log , log.p
|
logical; if TRUE, probabilities p are given as log(p). |
lower.tail |
logical; if TRUE (default), probabilities are
|
p |
vector of probabilities. |
n |
number of observations. |
The blended generalised extreme value distribution with location
parameter ,
scale
parameter and
shape
parameter has cumulative distribution function
given by
where
with ,
,
alpha
and beta
parameters and
,
respectively,
with ,
with ,
, with
pa
and pb
parameters and
, respectively, and where
and
denotes the cumulative distribution
of the Beta(5, 5) distribution.
Default values for pa
, pb
, alpha
and beta
are taken from Castro-Camilo et al. (2022).
dbgev
gives the density,
pbgev
gives the distribution function,
qbgev
gives the quantile function, and
rbgev
generates random deviates.
Castro-Camilo, D., Huser, R., & Rue, H. (2022). Practical strategies for generalized extreme value-based regression models for extremes. Environmetrics, 33(6), e2742. doi:10.1002/env.2742
dbgev(3, 2, 1, .1)
dbgev(3, 2, 1, .1)
Convert a response vector in a data frame to a matrix
df2matdf(x, formula)
df2matdf(x, formula)
x |
a data frame |
formula |
a formula |
This function identifies repeated combinations of explanatory variables
in a mgcv
formula
and then creates a
matrix
response variable in which each row corresponds to one of
unique explanatory variable combinations and each column to one of
replicates with the combination. Here
is the maximum number
of replicates for an explanatory variable combination; rows of the
matrix
are padded with NA
s at the end where there are fewer than
replicates.
A data.frame
http://arma.sourceforge.net/docs.html#pinv
Bind a list a data frames
dfbind(x)
dfbind(x)
x |
a list of data frames |
A data frame
z <- list(data.frame(x=1, y=1), data.frame(x=2, y=2)) dfbind(z)
z <- list(data.frame(x=1, y=1), data.frame(x=2, y=2)) dfbind(z)
Function evgam
fits generalised additive extreme-value models. It allows
the fitting of various extreme-value models, including the generalised
extreme value and Pareto distributions. It can also perform quantile regression
via the asymmetric Laplace distribution.
evgam( formula, data, family = "gev", correctV = TRUE, rho0 = 0, inits = NULL, outer = "bfgs", control = NULL, removeData = FALSE, trace = 0, knots = NULL, maxdata = 1e+20, maxspline = 1e+20, compact = FALSE, gpd.args = list(), ald.args = list(), exi.args = list(), pp.args = list(), bgev.args = list(), sandwich.args = list(), egpd.args = list(), custom.fns = list(), aggregated.args = list(), sp = NULL, gamma = 1, sparse = FALSE, args = list() )
evgam( formula, data, family = "gev", correctV = TRUE, rho0 = 0, inits = NULL, outer = "bfgs", control = NULL, removeData = FALSE, trace = 0, knots = NULL, maxdata = 1e+20, maxspline = 1e+20, compact = FALSE, gpd.args = list(), ald.args = list(), exi.args = list(), pp.args = list(), bgev.args = list(), sandwich.args = list(), egpd.args = list(), custom.fns = list(), aggregated.args = list(), sp = NULL, gamma = 1, sparse = FALSE, args = list() )
formula |
a list of formulae for location, scale and shape parameters, as in gam |
data |
a data frame |
family |
a character string giving the type of family to be fitted; defaults to |
correctV |
logicial: should the variance-covariance matrix include smoothing parameter uncertainty? Defaults to |
rho0 |
a scalar or vector of initial log smoothing parameter values; a scalar will be repeated if there are multiple smoothing terms |
inits |
a vector or list giving initial values for constant basis coefficients; if a list, a grid is formed using expand.grid, and the ‘best’ used; defaults to |
outer |
a character string specifying the outer optimiser is full |
control |
a list of lists of control parameters to pass to inner and outer optimisers; defaults to |
removeData |
logical: should |
trace |
an integer specifying the amount of information supplied about fitting, with |
knots |
passed to s; defaults to |
maxdata |
an integer specifying the maximum number of |
maxspline |
an integer specifying the maximum number of |
compact |
logical: should duplicated |
gpd.args |
a list of arguments for |
ald.args |
a list of arguments for |
exi.args |
a list of arguments for |
pp.args |
a list of arguments for |
bgev.args |
a list of arguments for |
sandwich.args |
a list of arguments for sandwich adjustment; see Details |
egpd.args |
a list of arguments for extended GPD; see Details |
custom.fns |
a list of functions for a custom family; see Details |
sp |
a vector of fixed smoothing parameters |
gamma |
a total penalty adjustment, such that higher values (>1) give smoother overall fits; defaults to 1 (no adjustment) |
sparse |
logical: should matrices be coerced to be recognised as sparse? Defaults to |
args |
a list of arguments to supply to the likelihood. Default to none, which is |
See family.evgam
for details of distributions that can be fitted with
evgam
using family = "..."
and details of gpd.args
, ald.args
,
exi.args
, pp.args
, bgev.args
and egpd.args
and see
custom.family.evgam
for details of using family = "custom"
with custom.fns
.
Arguments for the sandwich adjustment are given by sandwich.args
. A character
string id
can be supplied to the list, which identifies the name of the
variable in data
such that independence will be assumed between its values. The
method
for the adjustment is supplied as "magnitude"
(default) or "curvature"
;
see Chandler & Bate (2007) for their definitions.
An object of class evgam
Chandler, R. E., & Bate, S. (2007). Inference for clustered data using the independence loglikelihood. Biometrika, 94(1), 167-183.
Wood, S. N., Pya, N., & Safken, B. (2016). Smoothing parameter and model selection for general smooth models. Journal of the American Statistical Association, 111(516), 1548-1563.
Youngman, B. D. (2022). evgam: An R Package for Generalized Additive Extreme Value Models. Journal of Statistical Software. doi:10.18637/jss.v103.i03
data(fremantle) fmla_gev <- list(SeaLevel ~ s(Year, k=5, bs="cr"), ~ 1, ~ 1) m_gev <- evgam(fmla_gev, fremantle, family = "gev") data(COprcp) ## fit generalised Pareto distribution to excesses on 20mm COprcp <- cbind(COprcp, COprcp_meta[COprcp$meta_row,]) threshold <- 20 COprcp$excess <- COprcp$prcp - threshold COprcp_gpd <- subset(COprcp, excess > 0) fmla_gpd <- list(excess ~ s(lon, lat, k=12) + s(elev, k=5, bs="cr"), ~ 1) m_gpd <- evgam(fmla_gpd, data=COprcp_gpd, family="gpd") ## fit generalised extreme value distribution to annual maxima COprcp$year <- format(COprcp$date, "%Y") COprcp_gev <- aggregate(prcp ~ year + meta_row, COprcp, max) COprcp_gev <- cbind(COprcp_gev, COprcp_meta[COprcp_gev$meta_row,]) fmla_gev2 <- list(prcp ~ s(lon, lat, k=30) + s(elev, bs="cr"), ~ s(lon, lat, k=20), ~ 1) m_gev2 <- evgam(fmla_gev2, data=COprcp_gev, family="gev") summary(m_gev2) plot(m_gev2) predict(m_gev2, newdata=COprcp_meta, type="response") ## fit point process model using r-largest order statistics # we have `ny=30' years' data and use top 45 order statistics pp_args <- list(id="id", ny=30, r=45) m_pp <- evgam(fmla_gev2, COprcp, family="pp", pp.args=pp_args) ## estimate 0.98 quantile using asymmetric Laplace distribution fmla_ald <- prcp ~ s(lon, lat, k=15) + s(elev, bs="cr") m_ald <- evgam(fmla_ald, COprcp, family="ald", ald.args=list(tau=.98))
data(fremantle) fmla_gev <- list(SeaLevel ~ s(Year, k=5, bs="cr"), ~ 1, ~ 1) m_gev <- evgam(fmla_gev, fremantle, family = "gev") data(COprcp) ## fit generalised Pareto distribution to excesses on 20mm COprcp <- cbind(COprcp, COprcp_meta[COprcp$meta_row,]) threshold <- 20 COprcp$excess <- COprcp$prcp - threshold COprcp_gpd <- subset(COprcp, excess > 0) fmla_gpd <- list(excess ~ s(lon, lat, k=12) + s(elev, k=5, bs="cr"), ~ 1) m_gpd <- evgam(fmla_gpd, data=COprcp_gpd, family="gpd") ## fit generalised extreme value distribution to annual maxima COprcp$year <- format(COprcp$date, "%Y") COprcp_gev <- aggregate(prcp ~ year + meta_row, COprcp, max) COprcp_gev <- cbind(COprcp_gev, COprcp_meta[COprcp_gev$meta_row,]) fmla_gev2 <- list(prcp ~ s(lon, lat, k=30) + s(elev, bs="cr"), ~ s(lon, lat, k=20), ~ 1) m_gev2 <- evgam(fmla_gev2, data=COprcp_gev, family="gev") summary(m_gev2) plot(m_gev2) predict(m_gev2, newdata=COprcp_meta, type="response") ## fit point process model using r-largest order statistics # we have `ny=30' years' data and use top 45 order statistics pp_args <- list(id="id", ny=30, r=45) m_pp <- evgam(fmla_gev2, COprcp, family="pp", pp.args=pp_args) ## estimate 0.98 quantile using asymmetric Laplace distribution fmla_ald <- prcp ~ s(lon, lat, k=15) + s(elev, bs="cr") m_ald <- evgam(fmla_ald, COprcp, family="ald", ald.args=list(tau=.98))
Estimate extremal index using ‘intervals’ method
extremal(x, y = NULL)
extremal(x, y = NULL)
x |
a logical vector or list of logical vectors |
y |
an integer vector the same length as |
Intervals estimator of extremal index based on Ferro and Segers (2003)'s moment-based estimator.
If x
is supplied and y
is not, x
is assumed to identify consecutive threshold exceedances.
If x
is supplied as a list, each list element is assumed to comprise identifiers of consecutive exceedances.
If y
is supplied, x
must be a logical vector, and y
gives positions of x
in
its original with-missing-values vector: so y
identifies consecutive x
.
A scalar estimate of the extremal index
Ferro, C. A., & Segers, J. (2003). Inference for clusters of extreme values. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 65(2), 545-556.
n <- 1e2 x <- runif(n) extremal(x > .9) y <- sort(sample(n, n - 5)) x2 <- x[y] extremal(x2 > .9, y)
n <- 1e2 x <- runif(n) extremal(x > .9) y <- sort(sample(n, n - 5)) x2 <- x[y] extremal(x2 > .9, y)
Various distributions can be fitted with function evgam
using
family = "..."
, where options for ...
are given below. The default is
family = "gev"
.
The following families are currently available using evgam(..., family = "...")
.
"ald"
, the asymmetric Laplace distribution. This is primarily
intended for quantile regression, as in Yu & Moyeed (2001);
"gev"
(default), the generalised extreme value (GEV) distribution;
"exp"
, the exponential distribution;
"gpd"
, the generalised Pareto distribution;
"gauss"
, the Gaussian distribution.
"pp"
, the point process model for extremes. This is implemented
through -largest order statistics. See Details below.
"weibull"
, the Weibull distribution;
"exi"
, estimation if the extremal index. See Schlather & Tawn (2003) and Details below.
"egpd"
, the extended generalised Pareto distribution. See Naveau
et al. (2016) and Details below;
"bgev"
, the blended GEV distribution. See Castro-Camilo et al (2022) and Details
"condex"
, the conditional extreme value model. See Details
"custom"
, custom distributions. See custom.evgam
for an example of use.
Arguments for the asymmetric Laplace distribution are given by ald.args
. A
scalar tau
defines the quantile sought, which has no default. The scalar
C
specifies the curvature parameter of Oh et al. (2011).
Arguments for extremal index estimation are given by exi.args
. A character
string id
specifies the variable in data
over which an nexi
(default 2) running max. has been taken. The link
is specified as a character string,
which is one of "logistic"
, "probit"
, "cloglog"
; defaults to "logistic"
.
Arguments for the point process model are given by pp.args
. An integer r
specifies the number of order statistics from which the model will be estimated.
If r = -1
, all data
will be used. The character string id
specifies the variable
in data
over which the point process isn't integrated; e.g. if a map
of parameter estimates related to extremes over time is sought, integration isn't
over locations. The scalar nper
number of data per period of interest; scalar or
integer vector ny
specifies the number of periods; if length(ny) > 1
then names(ny)
must ne supplied and must match to every unique id
.
logical correctny
specifies whether ny
is
corrected to adjust proportionally for data missingness.
Arguments for the point process model are given by bgev.args
. Probabilities
pa
and pb
specify the lower and upper probabilities at the which
the Gumbel distribution blends into a GEV distribution. Then alpha
and
beta
specify the quantile and its range, respectively, used to parameterise
the GEV distribution. Defaults are pa = 0.05
and pb = 0.2
and
alpha = beta = 0.5
, as used in Castro-Camilo et al (2022).
Arguments for extended Pareto distribution are given by egpd.args
. An
integer, model
, specifies which model from Naveau et at. (2016) to fit.
The first two parameters of each are the GPD's log scale and shape parameters,
. Then, in the notation of Naveau et at. (2016) the
remaining parameters are
,
,
and
for models i, ii, iii
and iv, respectively, which are specified with
model = 1, 2, 3
or
4
, respectively.
See evgam
for examples.
Castro-Camilo, D., Huser, R., & Rue, H. (2022). Practical strategies for generalized extreme value-based regression models for extremes. Environmetrics, 33(6), e2742. doi:10.1002/env.2742
Naveau, P., Huser, R., Ribereau, P., and Hannart, A. (2016), Modeling jointly low, moderate, and heavy rainfall intensities without a threshold selection, Water Resources Research, 52, 2753-2769. doi:10.1002/2015WR018552
Oh, H. S., Lee, T. C., & Nychka, D. W. (2011). Fast nonparametric quantile regression with arbitrary smoothing methods. Journal of Computational and Graphical Statistics, 20(2), 510-526. doi:10.1198/jcgs.2010.10063
Schlather, M., & Tawn, J. A. (2003). A dependence measure for multivariate and spatial extreme values: Properties and inference. Biometrika, 90(1), 139-156. doi:10.1093/biomet/90.1.139
Youngman, B. D. (2022). evgam: An R Package for Generalized Additive Extreme Value Models. Journal of Statistical Software. doi:10.18637/jss.v103.i03
Yu, K., & Moyeed, R. A. (2001). Bayesian quantile regression. Statistics & Probability Letters, 54(4), 437-447.doi:10.1016/S0167-7152(01)00124-9
Daily maximum temperatures at Fort Collins, Colorado, US from 1st January 1970 to 31st December 2019
data(FCtmax)
data(FCtmax)
A data frame with 18156 rows and 2 variables
The variables are as follows:
date of observation
daily maximum temperature in degrees Celcius
library(evgam) data(FCtmax)
library(evgam) data(FCtmax)
Extract Model Fitted Values
## S3 method for class 'evgam' fitted(object, ...)
## S3 method for class 'evgam' fitted(object, ...)
object |
a fitted |
... |
not used |
Fitted values extracted from the object ‘object’.
data(fremantle) fmla_gev <- list(SeaLevel ~ s(Year, k=5, bs="cr"), ~ 1, ~ 1) m_gev <- evgam(fmla_gev, fremantle, family = "gev") fitted(m_gev)
data(fremantle) fmla_gev <- list(SeaLevel ~ s(Year, k=5, bs="cr"), ~ 1, ~ 1) m_gev <- evgam(fmla_gev, fremantle, family = "gev") fitted(m_gev)
The 'fremantle' data frame has 86 rows and 3 columns. The second column gives 86 annual maximimum sea levels recorded at Fremantle, Western Australia, within the period 1897 to 1989. The first column gives the corresponding years. The third column gives annual mean values of the Southern Oscillation Index (SOI), which is a proxy for meteorological volitility.
data(fremantle)
data(fremantle)
A data frame with 86 rows and 3 variables
The variables are as follows:
a numeric vector of years
a numeric vector of annual sea level maxima
A numeric vector of annual mean values of the Southern Oscillation Index
Coles, S. G. (2001) _An Introduction to Statistical Modelling of Extreme Values. London: Springer.
Eric Gilleland's ismev R package.
library(evgam) data(fremantle)
library(evgam) data(fremantle)
evgam
objectLog-likelihood, AIC and BIC from a fitted evgam
object
## S3 method for class 'evgam' logLik(object, ...)
## S3 method for class 'evgam' logLik(object, ...)
object |
a fitted |
... |
not used |
A scalar
data(fremantle) fmla_gev <- list(SeaLevel ~ s(Year, k=5, bs="cr"), ~ 1, ~ 1) m_gev <- evgam(fmla_gev, fremantle, family = "gev") logLik(m_gev) AIC(m_gev) BIC(m_gev)
data(fremantle) fmla_gev <- list(SeaLevel ~ s(Year, k=5, bs="cr"), ~ 1, ~ 1) m_gev <- evgam(fmla_gev, fremantle, family = "gev") logLik(m_gev) AIC(m_gev) BIC(m_gev)
Moore-Penrose pseudo-inverse of a matrix
#' Solve a sparse system of linear equations
#'
#' @param a a sparse square matrix
#' @param b a dense vector or matrix
#' @param lower.tri has just the lower triangle of a square sparse matrix been provided? Defaults to FALSE
#'
#' @details
#'
#' This function is merely a wrapper for Armadillo's spsolve function with its default settings.
#'
#' @return A dense vector or matrix of the same form as b
#'
#' @references
#'
#' http://arma.sourceforge.net/docs.html#spsolve
#'
#' @export
#'
spsolve <- function(a, b, lower.tri = FALSE)
if (!lower.tri)
armaspsolve(a, as.matrix(b))
else
armaspLsolve(a, as.matrix(b))
pinv(x, tol = -1) ginv.evgam(x, tol = sqrt(.Machine$double.eps))
pinv(x, tol = -1) ginv.evgam(x, tol = sqrt(.Machine$double.eps))
x |
a matrix |
tol |
a scalar |
This function is merely a wrapper for Armadillo's pinv function with its
default settings, which, in particular uses the divide-and-conquer
method. If tol
isn't provided Armadillo's default for pinv is used.
ginv.evgam
mimics ginv using Armadillo's pinv.
A matrix
http://arma.sourceforge.net/docs.html#pinv
evgam
objectPlot a fitted evgam
object
## S3 method for class 'evgam' plot(x, onepage = TRUE, which = NULL, main, ask = !onepage, ...)
## S3 method for class 'evgam' plot(x, onepage = TRUE, which = NULL, main, ask = !onepage, ...)
x |
a fitted |
onepage |
logical: should all plots be on one page, or on separate pages? Defaults to |
which |
a vector of integers identifying which smooths to plot. The default |
main |
a character string or vector of plot titles for each plot. If not supplied default titles are used |
ask |
logical: ask to show next plots if too many figures for current device? |
... |
extra arguments to pass to plot.gam |
Plots representing all one- or two-dimensional smooths
data(fremantle) fmla_gev <- list(SeaLevel ~ s(Year, k=5, bs="cr"), ~ 1, ~ 1) m_gev <- evgam(fmla_gev, fremantle, family = "gev") plot(m_gev)
data(fremantle) fmla_gev <- list(SeaLevel ~ s(Year, k=5, bs="cr"), ~ 1, ~ 1) m_gev <- evgam(fmla_gev, fremantle, family = "gev") plot(m_gev)
evgam
objectPredictions from a fitted evgam
object
## S3 method for class 'evgam' predict( object, newdata, type = "link", prob = NULL, se.fit = FALSE, marginal = TRUE, exi = FALSE, as.gev = FALSE, trace = 0, ... )
## S3 method for class 'evgam' predict( object, newdata, type = "link", prob = NULL, se.fit = FALSE, marginal = TRUE, exi = FALSE, as.gev = FALSE, trace = 0, ... )
object |
a fitted |
newdata |
a data frame |
type |
a character string giving the type of prediction sought; see Details. Defaults to |
prob |
a scalar or vector of probabilities for quantiles to be estimated if |
se.fit |
a logical: should estimated standard errors be returned? Defaults to |
marginal |
a logical: should uncertainty estimates integrate out smoothing parameter uncertainty? Defaults to |
exi |
a logical: if a dependent GEV is fitted should the independent parameters be returned? Defaults to |
as.gev |
a logical: should blended GEV parameters be converted to GEV parameters? Defaults to |
trace |
an integer where higher values give more output. -1 suppresses everything. Defaults to 0 |
... |
unused |
There are five options for type
: 1) "link"
distribution parameters
transformed to their model fitting scale; 2) "response"
as 1), but on their
original scale; 3) "lpmatrix" a list of design matrices; 4) "quantile"
estimates of distribution quantile(s); and 5) "qqplot" a quantile-quantile
plot.
A data frame or list of predictions, or a plot if type == "qqplot"
Youngman, B. D. (2022). evgam: An R Package for Generalized Additive Extreme Value Modules. Journal of Statistical Software. To appear. doi:10.18637/jss.v103.i03
data(fremantle) fmla_gev <- list(SeaLevel ~ s(Year, k=5, bs="cr"), ~ 1, ~ 1) m_gev <- evgam(fmla_gev, fremantle, family = "gev") # prediction of link GEV parameter for fremantle data predict(m_gev) # predictions for Year 1989 y1989 <- data.frame(Year = 1989) # link GEV parameter predictions predict(m_gev, y1989) # GEV parameter predictions predict(m_gev, y1989, type= "response") # 10-year return level predictions predict(m_gev, y1989, type= "quantile", prob = .9) # 10- and 100-year return level predictions predict(m_gev, y1989, type= "quantile", prob = c(.9, .99))
data(fremantle) fmla_gev <- list(SeaLevel ~ s(Year, k=5, bs="cr"), ~ 1, ~ 1) m_gev <- evgam(fmla_gev, fremantle, family = "gev") # prediction of link GEV parameter for fremantle data predict(m_gev) # predictions for Year 1989 y1989 <- data.frame(Year = 1989) # link GEV parameter predictions predict(m_gev, y1989) # GEV parameter predictions predict(m_gev, y1989, type= "response") # 10-year return level predictions predict(m_gev, y1989, type= "quantile", prob = .9) # 10- and 100-year return level predictions predict(m_gev, y1989, type= "quantile", prob = c(.9, .99))
evgam
objectPrint a fitted evgam
object
## S3 method for class 'evgam' print(x, ...)
## S3 method for class 'evgam' print(x, ...)
x |
a fitted |
... |
not used |
The call of the evgam
object
data(fremantle) fmla_gev <- list(SeaLevel ~ s(Year, k=5, bs="cr"), ~ 1, ~ 1) m_gev <- evgam(fmla_gev, fremantle, family = "gev") print(m_gev)
data(fremantle) fmla_gev <- list(SeaLevel ~ s(Year, k=5, bs="cr"), ~ 1, ~ 1) m_gev <- evgam(fmla_gev, fremantle, family = "gev") print(m_gev)
Quantile estimation of a composite extreme value distribution
qev( p, loc, scale, shape, m = 1, alpha = 1, theta = 1, family, tau = 0, bgev.args = list(pa = 0.05, pb = 0.2, alpha = 0.5, beta = 0.5), start = NULL, method = "uniroot" )
qev( p, loc, scale, shape, m = 1, alpha = 1, theta = 1, family, tau = 0, bgev.args = list(pa = 0.05, pb = 0.2, alpha = 0.5, beta = 0.5), start = NULL, method = "uniroot" )
p |
a scalar giving the quantile of the distribution sought |
loc |
a scalar, vector or matrix giving the location parameter |
scale |
as above, but scale parameter |
shape |
as above, but shape parameter |
m |
a scalar giving the number of values per return period unit, e.g. 365 for daily data giving annual return levels |
alpha |
a scalar, vector or matrix of weights if within-block variables not identically distributed and of different frequencies |
theta |
a scalar, vector or matrix of extremal index values |
family |
a character string giving the family for which return levels sought |
tau |
a scalar, vector or matrix of values giving the threshold quantile for the GPD (i.e. 1 - probability of exceedance) |
bgev.args |
a list specifying parameters of the blended GEV distribution; see Details |
start |
a 2-vector giving starting values that bound the return level |
method |
a character string giving the numerical estimation procedure; defaults to |
If is the generalised extreme value, generalised Pareto or blended
GEV distribution,
qev
solves
for . So vectors are supplied as $n$-vectors and matrices
are supplied as
matrices.
For all distributions, location, scale and shape parameters are given by
loc
, scale
and shape
. The generalised Pareto
distribution, for and
, is parameterised as
,
where
,
and
are its location, scale and shape
parameters, respectively, and
corresponds to argument
tau
.
For the blended GEV distribution pa
, pb
, alpha
and
beta
specify additional parameters of the blended GEV distribution; see
family.evgam
for details.
Estimates either use function uniroot
or method = "newton"
uses
the Newton-Rhaphson method. The latter is often much quicker if matrices
are supplied, i.e. for .
A scalar or vector of estimates of p
qev(0.9, c(1, 2), c(1, 1.1), .1, family = "gev") qev(0.99, c(1, 2), c(1, 1.1), .1, family = "gpd", tau = 0.9) # an example representative on monthly estimates at two locations qev(0.9, matrix(c(1:12, 2:13), 12, 2), 1.1, .1, family = "gev") # a blended GEV example with default blended GEV specification qev(0.9, matrix(c(1:12, 2:13), 12, 2), 1.1, .1, family = "bgev")
qev(0.9, c(1, 2), c(1, 1.1), .1, family = "gev") qev(0.99, c(1, 2), c(1, 1.1), .1, family = "gpd", tau = 0.9) # an example representative on monthly estimates at two locations qev(0.9, matrix(c(1:12, 2:13), 12, 2), 1.1, .1, family = "gev") # a blended GEV example with default blended GEV specification qev(0.9, matrix(c(1:12, 2:13), 12, 2), 1.1, .1, family = "bgev")
Running -value maximum and data frame with variable swapped for running maximum
runmax(y, n) dfrunmax(data, cons, ynm, n = 2)
runmax(y, n) dfrunmax(data, cons, ynm, n = 2)
y |
a vector |
n |
an integer giving the number of observations to calculate running maxmimum over; defaults to 2 |
data |
a data frame |
cons |
a character string for the variable in |
ynm |
a character string for the variable in |
runmax
returns a vector of the same dimension as y
dfrunmax
returns a data frame with observations swapped for -observation running maximum
runmax(runif(10), 5)
runmax(runif(10), 5)
Generate a sequence of values between a range.
seq_between(x, length = NULL)
seq_between(x, length = NULL)
x |
a 2-vector |
length |
an integer |
A vector
seq_between(c(1, 9)) seq_between(range(runif(10)), 5)
seq_between(c(1, 9)) seq_between(range(runif(10)), 5)
evgam
objectSimulations from a fitted evgam
object
## S3 method for class 'evgam' simulate( object, nsim = 1000, seed = NULL, newdata, type = "link", probs = NULL, threshold = 0, marginal = TRUE, ... )
## S3 method for class 'evgam' simulate( object, nsim = 1000, seed = NULL, newdata, type = "link", probs = NULL, threshold = 0, marginal = TRUE, ... )
object |
a fitted |
nsim |
an integer giving the number of simulations |
seed |
an integer giving the seed for simulations |
newdata |
a data frame |
type |
a character string, as in |
probs |
a scalar or vector of probabilities for quantiles; defaults to NULL |
threshold |
a scalar, vector or matrix, which is added to each simulation if |
marginal |
a logical: should simulations integrate out smoothing parameter uncertainty? Defaults to TRUE |
... |
arguments to be passed to |
Simulations of parameters or quantiles
data(fremantle) fmla_gev <- list(SeaLevel ~ s(Year, k=5, bs="cr"), ~ 1, ~ 1) m_gev <- evgam(fmla_gev, fremantle, family = "gev") # simulations of link GEV parameters for fremantle data simulate(m_gev, nsim=5) # simulations for Year 1989 y1989 <- data.frame(Year = 1989) # link GEV parameter simulations simulate(m_gev, nsim=5, newdata = y1989) # GEV parameter simulations simulate(m_gev, nsim=5, newdata = y1989, type = "response") # 10-year return level simulations simulate(m_gev, nsim=5, newdata = y1989, type= "quantile", prob = .9) # 10- and 100-year return level simulations simulate(m_gev, nsim=5, newdata = y1989, type= "quantile", prob = c(.9, .99))
data(fremantle) fmla_gev <- list(SeaLevel ~ s(Year, k=5, bs="cr"), ~ 1, ~ 1) m_gev <- evgam(fmla_gev, fremantle, family = "gev") # simulations of link GEV parameters for fremantle data simulate(m_gev, nsim=5) # simulations for Year 1989 y1989 <- data.frame(Year = 1989) # link GEV parameter simulations simulate(m_gev, nsim=5, newdata = y1989) # GEV parameter simulations simulate(m_gev, nsim=5, newdata = y1989, type = "response") # 10-year return level simulations simulate(m_gev, nsim=5, newdata = y1989, type= "quantile", prob = .9) # 10- and 100-year return level simulations simulate(m_gev, nsim=5, newdata = y1989, type= "quantile", prob = c(.9, .99))
evgam
objectSummary method for a fitted evgam
object
## S3 method for class 'evgam' summary(object, ...) ## S3 method for class 'summary.evgam' print(x, ...)
## S3 method for class 'evgam' summary(object, ...) ## S3 method for class 'summary.evgam' print(x, ...)
object |
a fitted |
... |
not used |
x |
a |
The key part of summary.evgam is p-values for smooths.
The tests use code directly taken from mgcv 1.8-14
. This is
to avoid use of mgcv:::...
. Tests implement the method of
Wood (2013).
A summary.evgam
object
Wood, S. N., (2013) On p-values for smooth components of an extended generalized additive model, Biometrika 100(1) 221–228
data(fremantle) fmla_gev <- list(SeaLevel ~ s(Year, k=5, bs="cr"), ~ 1, ~ 1) m_gev <- evgam(fmla_gev, fremantle, family = "gev") summary(m_gev)
data(fremantle) fmla_gev <- list(SeaLevel ~ s(Year, k=5, bs="cr"), ~ 1, ~ 1) m_gev <- evgam(fmla_gev, fremantle, family = "gev") summary(m_gev)