lfmix

“lfmix: Mixed frequency analysis of low flows using mixed distribution and mixed copula estimators”

library(mixdist)
#> Loading required package: lfstat
#> Loading required package: xts
#> Loading required package: zoo
#> 
#> Attaching package: 'zoo'
#> The following objects are masked from 'package:base':
#> 
#>     as.Date, as.Date.numeric
#> Loading required package: lmom
#> Loading required package: lattice
#> Loading required package: FAdist
#library(lfstat)
#library(FAdist)

Introduction to package lfmix (i.e mixdist)

This package implements mixed distribution estimators for low-flow frequency analysis. These estimators are generalizations of the common probability estimator for annual minima series (AMS) for the case that low flows are generated by different processes, violating the homogeneity assumption of frequency analysis. In such cases, the improved estimators provide a consistent framework for estimating return periods in summer and winter, as well as combined probabilities of annual events.

Theoretical background

In the approach, the annual minima series $\begin{equation} M_n = min \{X_1, ..., X_n\} \end{equation}$ is regarded as the minima of the annual summer minima $M_{n,S}$ and winter minima $M_{n,W}$ : $\begin{equation} \label{equation_1} M_n = min \left\{M_{n,S}, M_{n,W}\right\}. \end{equation}$ In this notation, $X$ refers to the original variable (daily flow), and $M_n$ to the annual minima for $n$ years.

The first method cdfwei_Mixed implements the mixed distribution estimator for annual minima series of , assuming independent and seasonally separable types of low-flow events.

Let $F_S(z)$ and $F_W(z)$ denote the marginal extreme value distributions of the summer and winter minima series. Under independence, the cumulative distribution function (cdf) of the annual low flow is $\begin{equation} \label{equation_3} F_{mix}(z)=1-\left\{1-F_S(z)\right\}\left\{1-F_W(z)\right\}. \end{equation}$

In our implementation for minima series, a Weibull distribution is used to model the marginal distributions of summer and winter events.

The method pobs_mixed_FUN implements an analogous estimator for the empirical probabilities $\begin{equation} \label{equation_2} p_{mix} = 1 - (1-p_S)(1-p_W), \end{equation}$ where $p_S$ and $p_W$ are the respective probabilities of occurrence in the summer and winter seasons. These are estimated by a common empirical probability estimator.

The second method, cdfwei_MixedC, implements the mixed copula estimator of . It is a generalization of the mixed distribution estimator that allows for seasonal dependence.

Based on the marginal summer and winter distributions $F_S$ and $F_W$ , the probability of event $z$ occurring in any season is obtained by $\begin{equation} \label{equation_6} F_{mix,C}(z)=F_S(z)+F_W(z)-C[F_S(z),F_W(z)]. \end{equation}$

As with the mixed probability estimator, a Weibull distribution is used to model the marginal distributions of summer and winter events. The method implements a Gumbel–Hougaard copula by default as the copula model $C$ . Alternatively, an Independence copula can be used, which reduces the method to the mixed distribution estimator assuming independent events.

In the same way as for the theoretical estimator, the method pobs_mixed_FUN implements an analogous estimator for the empirical probabilities $\begin{equation} \label{equation_10} p_{mix,C}(z) =p_{S}(z)+p_{W}(z)-C_n\left[p_{S}(z),p_{W}(z)\right], \end{equation}$ where $p_{S}(z)$ and $p_{W}(z)$ denote the empirical probability of $z$ occurring in the summer and winter seasons, respectively. The $C_n(\cdot)$ is the empirical copula, which defines the empirical multivariate distribution.

Method `weiMixT`

The method weiMixT is a comprehensive function that implements all estimators, allowing for an in-depth analysis of the event probabilities within the sample. It also provides several flow characteristics related to the mixed distribution approaches, including seasonality indices, seasonal correlation, and the mixture rate of summer and winter events in the annual sample. Additionally, it offers an assessment of how the improved estimates of return periods (and probabilities) differ from the common annual estimator.

The general call of the method is as follows:

output <- weiMixT(x1=xa, sortx=FALSE, T=c(100, 50, 20))
#> Warning: Probably not enough observations to calculate annual minima for the hydrological years:
#> 2010 (58 obs)
#> Warning: Probably not enough observations to calculate annual minima for the hydrological years:
#> 2010 (58 obs)
#> Warning: Probably not enough observations to calculate annual minima for the hydrological years:
#> 2010 (58 obs)
#> Warning: Probably not enough observations to calculate annual minima for the hydrological years:
#> 2010 (58 obs)

The output is a list with following main elements:

$sample

A data.frame containing event characteristics of annual low flow events with folowing colums: hyear is the hydrological year, x is the annual minimum flow. The p. (and T.) are event probabilities (and return periods) from various probability estimators:
Empirical probabilities (and return periods)
pobs are empirical probabilities using the classical Weibull estimator
pobs.mix empirical return periods from the mixed distribution approach
Theoretical probabilities (and return periods)
pA (TA)… classical anual minima series approach
pMix (TMix) … mixed distribution estimator from Laaha (2023a)
pMixC (TMixC) … mixed copula estimator from Laaha (2023b)

head(output$sample)
#>   hyear         x       pobs   pobs.mix        pA       TA        pMix
#> 1  1976 0.2208571 0.22222222 0.22222222 0.2294649 4.357965 0.224758013
#> 2  1977 0.2180000 0.19444444 0.19753086 0.2101373 4.758794 0.205059242
#> 3  1978 0.3075714 0.75000000 0.80864198 0.7369623 1.356922 0.788882343
#> 4  1979 0.1588571 0.02777778 0.02777778 0.0000000      Inf 0.002640444
#> 5  1980 0.3672857 0.86111111 0.94135802 0.9053666 1.104525 0.951602656
#> 6  1981 0.4204286 0.97222222 0.98996914 0.9673425 1.033760 0.991134583
#>         TMix       pMixC      TMixC
#> 1   4.449230 0.214684356   4.658001
#> 2   4.876640 0.196319195   5.093745
#> 3   1.267616 0.739387671   1.352470
#> 4 378.724227 0.002640444 378.724227
#> 5   1.050859 0.912000944   1.096490
#> 6   1.008945 0.970473554   1.030425

$To_events:

A data.frame containing the nominal and improved probabilities (p.) and return periods (T.) of of ´design´ events with specified return periods from various estimators. The design events are specified by argument T with default to 100, 50 and 20 year events. To (po) … nominal values corresponding to classical annual minimum estimator (AMS approach)
Tmix (pmix) … mixed distribution estimator from Laaha (2023a)
TmixC (pmixC) … mixed copula estimator from Laaha (2023b)
Ts (ps) … seasonal estimate from the marginal summer distribution)
Tw (pw) … seasonal estimate from the marginal winter distribution

output$To_events
#>    To   po     Tmix       pmix    TmixC      pmixC       Ts         ps       Tw
#> 1 100 0.01 37.77460 0.02647281 37.77460 0.02647281 37.77460 0.02647281      Inf
#> 2  50 0.02 30.75477 0.03251528 30.80299 0.03246438 31.38146 0.03186595 1490.983
#> 3  20 0.05 17.99798 0.05556180 18.23204 0.05484850 21.63549 0.04622036  102.102
#>             pw
#> 1 0.0000000000
#> 2 0.0006706984
#> 3 0.0097941256

Various other flow characteristics

Further elements are flow characteristics, including the circular seasonality index SI, the seasonality ratio SR, the Spearman rank correlation of summer and winter series Rho together with its significance value Rho_p-value, and the mixture_rate of summer and winter events in the sample are calculated.

Steps of analysis of `weiMixT`

The following section shows the subsequent analysis steps to produce the output. We first refer to the data before showing the calculation of flow characteristics. The second part shows the calculation of sample probabilities using the various estimators. The final part shows the calculation of return period (and probability) estimates of design events by various estimators.

1. Data

The calculation is based on a daily time series x1 that is stored as a object of class lfobj. For further details use help(lfobj) from package lfstat.

x1 <- xa
head(x1)
#>    day month year  flow hyear
#> 91  31     3 1976 0.622  1975
#> 92   1     4 1976 0.833  1976
#> 93   2     4 1976 0.958  1976
#> 94   3     4 1976 0.971  1976
#> 95   4     4 1976 0.989  1976
#> 96   5     4 1976 1.050  1976

2. Flow indices and sample probabilities

Circular seasonality index (SI)

SI <- seasindex(x1)
print(SI)
#> $theta
#> [1] 5.345775
#> 
#> $D
#> [1] 310.5444
#> 
#> $r
#> [1] 0.4851579

Seasonality ratio (SR)

SR <- seasratio(x1, breakdays=c("01/04", "01/11"))
SR <- as.numeric(SR)
print(SR)
#> [1] 1.093863

Separate the time series x1 into sum/win season

x1 <- xa
x1s <- x1[ x1$month >= 4 & x1$month <= 10 , ]
x1w <- x1[ x1$month < 4  | x1$month > 10 , ]

Prepare annual and seasonal series

AM0 <- MAM(x1,n=7, yearly = TRUE)
AM <- AM0$MAn
lmom <- samlmu(AM)
x.pelwei <- pelwei(lmom); x.pelwei
#>      zeta      beta     delta 
#> 0.1759369 0.1088168 1.5195907
x.p <- cdfwei(AM, para = x.pelwei)

AM0s <- MAM(x1s,n=7, yearly = TRUE)
#if (sortx==TRUE) {AM0s <- AM0s[order(AM0s$MAn),]}
AMs <- AM0s$MAn
lmoms <- samlmu(AMs)
x.pelwei.s <- pelwei(lmoms); x.pelwei.s
#>      zeta      beta     delta 
#> 0.1470125 0.1790483 2.1855520
x.ps <- cdfwei(AM, para = x.pelwei.s)

AM0w <- MAM(x1w,n=7, yearly = TRUE)
#> Warning: Probably not enough observations to calculate annual minima for the hydrological years:
#> 2010 (58 obs)
#if (sortx==TRUE) {AM0w <- AM0w[order(AM0w$MAn),]}
AMw <- AM0w$MAn
lmomw <- samlmu(AMw)
x.pelwei.w <- pelwei(lmomw); x.pelwei.w
#>      zeta      beta     delta 
#> 0.1825804 0.1469084 1.6393752
x.pw <- cdfwei(AM, para = x.pelwei.w)

Correlation and mixture rate

AM0sw <- merge(AM0s, AM0w, by = "hyear", all = TRUE)
cor_spearman <- cor.test(AM0sw$MAn.x, AM0sw$MAn.y, method = "spearman")
print(cor_spearman)
#> 
#>  Spearman's rank correlation rho
#> 
#> data:  AM0sw$MAn.x and AM0sw$MAn.y
#> S = 5564, p-value = 0.2018
#> alternative hypothesis: true rho is not equal to 0
#> sample estimates:
#>       rho 
#> 0.2207283
mixture_rate <- sum(AMs < AMw)/length(AM)
print(mixture_rate)
#> [1] 0.5714286

Empirical probabilites (Weibull plotting positions)

# Common Weibull prob estimators -> Same as pobs(AM) from package copula! 
.pobs <- (rank(AM))/(length(AM) + 1) # Weibull prob estimators -> Same as pobs(AM) from package copula!
print(.pobs)
#>  [1] 0.22222222 0.19444444 0.75000000 0.02777778 0.86111111 0.97222222
#>  [7] 0.41666667 0.05555556 0.08333333 0.11111111 0.52777778 0.50000000
#> [13] 0.72222222 0.27777778 0.63888889 0.36111111 0.61111111 0.83333333
#> [19] 0.33333333 0.16666667 0.38888889 0.25000000 0.88888889 0.44444444
#> [25] 0.69444444 0.47222222 0.94444444 0.13888889 0.80555556 0.66666667
#> [31] 0.58333333 0.77777778 0.55555556 0.30555556 0.91666667

# Empirical mixed prob 
.pobs.mix <- pobs_mixed_FUN(AM=AM, AMs=AMs, AMw=AMw, plot=FALSE) # $AM, $p.mix and $p.mix.c are returned
print(.pobs.mix)
#> $AM
#>  [1] 0.2208571 0.2180000 0.3075714 0.1588571 0.3672857 0.4204286 0.2460000
#>  [8] 0.1910000 0.1981429 0.2061429 0.2671429 0.2602857 0.2914286 0.2334286
#> [15] 0.2764286 0.2440000 0.2728571 0.3610000 0.2437143 0.2175714 0.2441429
#> [22] 0.2312857 0.3850000 0.2585714 0.2810000 0.2588571 0.4171429 0.2160000
#> [29] 0.3305714 0.2805714 0.2717143 0.3211429 0.2701429 0.2368571 0.3855714
#> 
#> $p.mix
#>  [1] 0.22222222 0.19753086 0.80864198 0.02777778 0.94135802 0.98996914
#>  [7] 0.40586420 0.05555556 0.12114198 0.13406636 0.50925926 0.48881173
#> [13] 0.75733025 0.31712963 0.62037037 0.37403549 0.58410494 0.91936728
#> [19] 0.36265432 0.17245370 0.38464506 0.29398148 0.96990741 0.44791667
#> [25] 0.69907407 0.46836420 0.97993827 0.14699074 0.87461420 0.68402778
#> [31] 0.54745370 0.83217593 0.52854938 0.33989198 0.97492284
#> 
#> $p.mix.C
#>  [1] 0.20753968 0.17976190 0.75357143 0.02777778 0.85674603 0.96388889
#>  [7] 0.40119048 0.05555556 0.09642857 0.11031746 0.54007937 0.51230159
#> [13] 0.72817460 0.29007937 0.64960317 0.35952381 0.62261905 0.83055556
#> [19] 0.34563492 0.15198413 0.37341270 0.26230159 0.88134921 0.45674603
#> [25] 0.70277778 0.48452381 0.93690476 0.12420635 0.80595238 0.67500000
#> [31] 0.59563492 0.78055556 0.56785714 0.31785714 0.90912698

Predicted mixed prob (Weibull cdf model)

# Mixed distribution approach
x.pred.mix <- cdfwei_Mixed(x=AM, y=AM, para.x = pelwei(samlmu(AMs)), para.y = pelwei(samlmu(AMw)), plot=FALSE)
print(x.pred.mix)
#>  [1] 0.224758013 0.205059242 0.788882343 0.002640444 0.951602656 0.991134583
#>  [7] 0.409105248 0.054201927 0.085925533 0.129396671 0.560617249 0.513075429
#> [13] 0.709684945 0.315543409 0.621511622 0.394232473 0.598611989 0.942245874
#> [19] 0.392104942 0.202145092 0.395296011 0.299733026 0.971383502 0.500902553
#> [25] 0.649796009 0.502938549 0.990049621 0.191558428 0.873560438 0.647195070
#> [31] 0.591142470 0.842686275 0.580763993 0.340993186 0.971883362

# Predicted mixed prob with copula (model)
x.pred.mixC <- cdfwei_MixedC(x=AM, y=AM, para.x = pelwei(samlmu(AMs)), para.y = pelwei(samlmu(AMw)), x1=x1, plot=FALSE)
#> Warning: Probably not enough observations to calculate annual minima for the hydrological years:
#> 2010 (58 obs)
print(x.pred.mixC)
#>  [1] 0.214684356 0.196319195 0.739387671 0.002640444 0.912000944 0.970473554
#>  [7] 0.384996538 0.053535891 0.083986892 0.125277485 0.524406010 0.480588513
#> [13] 0.663400781 0.298827951 0.580793022 0.371320045 0.559543095 0.900526969
#> [19] 0.369363216 0.193598346 0.372298207 0.284222651 0.938203617 0.469386959
#> [25] 0.607137286 0.471260125 0.968326591 0.183704729 0.824468634 0.604709625
#> [31] 0.552624589 0.792768451 0.543021031 0.322305288 0.938915589

Build output object `$sample`

#obs <- data.frame(cbind(hyear=AM0$hyear, x=AM, pobs=.pobs, pobs.mix=.pobs.mix$p.mix, pobs.mix=.pobs.mixC$p.mix.c))
sample <- data.frame(hyear=AM0$hyear, x=AM, pobs=.pobs, pobs.mix=.pobs.mix$p.mix, pA=x.p, TA=1/x.p, pMix=x.pred.mix, TMix=1/x.pred.mix, "pMixC"=x.pred.mixC, "TMixC"=1/x.pred.mixC)

3. Return period (and probability) estimates of design events by various estimators

Here we show how the $To_events are calculated. As stated above, the aim is to produce a data.frame containing the nominal and improved probabilities (p.) and return periods (T.) of of ´design´ events with specified return periods from various estimators. The design events are specified by argument T with default to 100, 50 and 20 year events.

The data.frame will allow us to analyse the change in return period when using our improved estimators

Annual Series (no seasons, common probability distribution)

T=c(100,50, 20)
.f <- 1/T # Hier Probability 1/T
.y <- quawei(.f, para = x.pelwei) # Hier Quantilwerte von T => als Eingang in andere Prob-Berechnungen
.p <- cdfwei(.y, para = x.pelwei)
pred <- data.frame(yo=.y, po=.p, To=1/.p)
print(pred)
#>          yo   po  To
#> 1 0.1812090 0.01 100
#> 2 0.1842840 0.02  50
#> 3 0.1913476 0.05  20

Seasonal distributions

# -) summer
.ys <- quawei(.f, para = x.pelwei.s)
.ps <- cdfwei(.y, para = x.pelwei.s)
pred.s <- data.frame(y_of_To=.ys, p=.ps, T=1/.ps)
print(pred.s)
#>     y_of_To          p        T
#> 1 0.1688332 0.02647281 37.77460
#> 2 0.1770463 0.03186595 31.38146
#> 3 0.1930126 0.04622036 21.63549
# -) winter
.yw <- quawei(.f, para = x.pelwei.w)
.pw <- cdfwei(.y, para = x.pelwei.w)
pred.w <- data.frame(y_of_To=.yw, p=.pw, T=1/.pw)
print(pred.w)
#>     y_of_To            p        T
#> 1 0.1914602 0.0000000000      Inf
#> 2 0.1961750 0.0006706984 1490.983
#> 3 0.2065797 0.0097941256  102.102

Mixed distribution estimator

.pmix <- cdfwei_Mixed(x=.y, y=.y, para.x = pelwei(samlmu(AMs)), para.y = pelwei(samlmu(AMw)), plot=FALSE, pch="x")

# Ermittle Quantile der Mischverteilung mit p=.f (z.b. p=1/20=0.2)
# Dazu feine Diskretisierung
t.y1 <- par("usr")[3] ; t.y2 <- par("usr")[4] ### wie ohne plot berechnen???
t.y.qua <- seq(round(t.y1, 1), round(t.y2, 1), by=0.01)
t.pmix <- cdfwei_Mixed(x=t.y.qua, y=t.y.qua, para.x = pelwei(samlmu(AMs)), para.y = pelwei(samlmu(AMw)), plot=FALSE, pch="x")
# Nun Suche an der y-Stelle der Verteilung wo x=.pmix=.f ist
# ... d.h, größter Wert mit p <= .f
f.qmix <- vector()
for (i in 1:length(.f)){
  f.qmix[i] <- max(t.y.qua[t.pmix <= .f[i]]) ### better vectorize ...
}

pred.mix <- data.frame(y_of_To=f.qmix, p=.pmix, T=1/.pmix)
print(pred.mix)
#>   y_of_To          p        T
#> 1    0.16 0.02647281 37.77460
#> 2    0.17 0.03251528 30.75477
#> 3    0.18 0.05556180 17.99798

Mixed copula estimator

.pmixC <- cdfwei_MixedC(x=.y, y=.y, para.x = pelwei(samlmu(AMs)), para.y = pelwei(samlmu(AMw)), x1=x1, plot=FALSE)
#> Warning: Probably not enough observations to calculate annual minima for the hydrological years:
#> 2010 (58 obs)

# Ermittle Quantile der Mischverteilung mit p=.f (z.b. p=1/20=0.2)
# Dazu feine Diskretisierung
t.y1 <- par("usr")[3] ; t.y2 <- par("usr")[4] ### wie ohne plot berechnen???
t.y.qua <- seq(round(t.y1, 1), round(t.y2, 1), by=0.01)
t.pmixC <- cdfwei_MixedC(x=t.y.qua, y=t.y.qua, para.x = pelwei(samlmu(AMs)), para.y = pelwei(samlmu(AMw)), x1=x1, plot=FALSE, pch="x")
#> Warning: Probably not enough observations to calculate annual minima for the hydrological years:
#> 2010 (58 obs)
# Nun Suche an der y-Stelle der Verteilung wo x=.pmix=.f ist
# ... d.h, gr??er Wert mit p <= .f
f.qmixC <- vector()
for (i in 1:length(.f)){
  f.qmixC[i] <- max(t.y.qua[t.pmixC <= .f[i]]) ### better vectorize ...
}

pred.mixC <- data.frame(y_of_To=f.qmixC, p=.pmixC, T=1/.pmixC)
print(pred.mixC)
#>   y_of_To          p        T
#> 1    0.16 0.02647281 37.77460
#> 2    0.17 0.03246438 30.80299
#> 3    0.19 0.05484850 18.23204

Finally build output object `$To_events`

To_events <- data.frame("To"=1/.p, "po"=.p, "Tmix"=1/.pmix , "pmix"=.pmix , "TmixC"=1/.pmixC , "pmixC"=.pmixC , "Ts"=1/.ps, "ps"=.ps, "Tw"=1/.pw, "pw"=.pw)
print(To_events)
#>    To   po     Tmix       pmix    TmixC      pmixC       Ts         ps       Tw
#> 1 100 0.01 37.77460 0.02647281 37.77460 0.02647281 37.77460 0.02647281      Inf
#> 2  50 0.02 30.75477 0.03251528 30.80299 0.03246438 31.38146 0.03186595 1490.983
#> 3  20 0.05 17.99798 0.05556180 18.23204 0.05484850 21.63549 0.04622036  102.102
#>             pw
#> 1 0.0000000000
#> 2 0.0006706984
#> 3 0.0097941256