License: arXiv.org perpetual non-exclusive license
arXiv:2604.02308v1 [math.NA] 02 Apr 2026

[1,3]\fnmThomas \surIzgin

[1]\orgdivDepartment of Mathematics, \orgnameUniversity of Kassel, \orgaddress\streetHeinrich-Plett-Str. 40, \postcode34132, \cityKassel, \countryGermany 2]\orgdivInstitute of Mathematics, \orgnameJohannes Gutenberg University Mainz, \orgaddress\streetStaudingerweg 9, \postcode55128, \cityMainz, \countryGermany 3]\orgdivDivision of Applied Mathematics, \orgaddress\orgnameBrown University, \cityProvidence, \stateRhode Island \postcode02906, \countryUSA

A Positivity-Preserving Relaxation Algorithm

izgin@mathematik.uni-kassel.de    \fnmHendrik \surRanocha hendrik.ranocha@uni-mainz.de    \fnmChi-Wang \surShu chi-wang_shu@brown.edu * [ [
Abstract

We combine Patankar-type methods with suitable relaxation procedures that are capable of ensuring correct dissipation or conservation of functionals such as entropy or energy while producing unconditionally positive and conservative approximations. To that end, we adapt the relaxation algorithm to enforce positivity by using either ideas from the dense output framework when a linear invariant must be preserved, or simply a geometric mean if the only constraint is positivity preservation. The latter merely requires the solution of a scalar nonlinear equation while former results in a coupled linear-nonlinear system of equations. We present sufficient conditions for the solvability of the respective equations. Several applications in the context of ordinary and partial differential equations are presented, and the theoretical findings are validated numerically.

keywords:
Positivity preservation, Relaxation methods, Entropy stability
pacs:
[

MSC Classification]65M06, 65M08, 65M20,65M22

1 Introduction

We consider initial-value problems (IVPs)

𝐮(t)=𝐟(𝐮(t)),𝐮(t0)\displaystyle\mathbf{u}^{\prime}(t)=\mathbf{f}(\mathbf{u}(t)),\quad\mathbf{u}(t_{0}) =𝐮0d,\displaystyle=\mathbf{u}^{0}\in\mathbb{R}^{d}, (1)

either as classical ordinary differential equation (ODE) model on its own, or more typically obtained after discretizing a partial differential equation (PDE) in space. We are interested in two types of structures of the IVP (1). First, many applications require positive solutions, i.e., 𝐮(t)>𝟎\mathbf{u}(t)>\mathbf{0} for all tt0t\geq t_{0} if 𝐮0>𝟎\mathbf{u}^{0}>\mathbf{0}, where inequalities are understood component-wise. This occurs, for example, when modeling chemical reactions, population dynamics, or the density of fluids. Second, many problems are equipped with additional functionals of interest, such as Lyapunov functionals, energy, or entropy. We say that the IVP (1) is dissipative with respect to a smooth functional η\eta, if η(𝐮)𝐟(𝐮)𝟎\eta^{\prime}(\bf u)\bf f(\bf u)\leq 0, i.e.,

ddtη(𝐮(t))0\frac{\mathrm{d}}{\mathrm{d}t}\eta(\mathbf{u}(t))\leq 0

for all solutions 𝐮(t)\mathbf{u}(t) of (1). Similarly, (1) is conservative with respect to η\eta, if η(𝐮)𝐟(𝐮)=𝟎\eta^{\prime}(\bf u)\bf f(\bf u)=0, i.e.,

ddtη(𝐮(t))=0.\frac{\mathrm{d}}{\mathrm{d}t}\eta(\mathbf{u}(t))=0.

In addition to the (typically nonlinear) functional η\eta, many problems also conserve additional linear invariants, such as mass or momentum, which we also want to preserve on the discrete level.

When discretizing (1) in time using a one-step method, we would like to preserve these properties, i.e., we would like to have an unconditionally positive method satisfying

𝐮0>𝟎𝐮n>𝟎for all n0.\mathbf{u}^{0}>\mathbf{0}\implies\mathbf{u}^{n}>\mathbf{0}\quad\text{for all }n\geq 0.

For many positive ODEs/PDEs, avoiding negative approximations is critical; such artifacts can lead to qualitatively incorrect solutions or the total failure of the numerical method [BBKS2007, sandu2001positive, STKB2005, SSPMPRK2]. Moreover, for dissipative problems, we would like to use a dissipative method that satisfies

η(𝐮n)η(𝐮n1)η(𝐮0).\eta(\mathbf{u}^{n})\leq\eta(\mathbf{u}^{n-1})\leq\dotsc\leq\eta(\mathbf{u}^{0}). (2)

Similarly, a conservative method applied to a conservative problem should satisfy

η(𝐮n)=η(𝐮n1)==η(𝐮0).\eta(\mathbf{u}^{n})=\eta(\mathbf{u}^{n-1})=\dotsc=\eta(\mathbf{u}^{0}). (3)

For convex η\eta, the implicit Euler method is a well-known example of an unconditionally positive and dissipative method. However, this analysis neglects possible positivity issues that can arise while solving the implicit equations as well as remaining errors of the nonlinear iterative solver.

Concerning positivity, the implicit Euler method is essentially the best method one can use in the class of general linear methods, since any unconditionally positive method can be at most first-order accurate [bolley1978conservation]. To address this challenge, several strategies have been proposed:

  1. 1.

    Clipping techniques, which forcibly set negative values to zero, either result in a mass-shifting optimization problem or otherwise compromise conservation of linear invariants, and, to date, lack a proof of stability [BIM2022].

  2. 2.

    Projection techniques [sandu2001positive, nusslein2021positivity] can be positive and conserve linear invariants, but they may result in step size constraints and/or reduced accuracy.

  3. 3.

    Fully implicit, nonlinear methods [HR2020, ricchiuto2011habilitation] can enforce positivity but require costly iterative solvers, which may fail to converge (to a positive solution), and thus, still produce nonphysical results.

  4. 4.

    Diagonally split Runge–Kutta (DSRK) methods [horvath_positivity_1998] can be unconditionally positive and with order higher than one. However, they are typically less accurate than the implicit Euler method in practice [macdonald2007].

  5. 5.

    Adaptive methods [STKB2005] use root-finding procedures and adapt the time step size. This can be effective, but the resulting schemes are only conditionally positive.

  6. 6.

    Strong stability preserving (SSP) methods [GKS2011] are positive if the explicit Euler method is positive under a certain time step restriction. However, only the implicit Euler method leads to unconditional positivity, and thus, all other SSP methods are only conditionally positive.

  7. 7.

    Patankar-type methods represent a family of explicit or linearly implicit yet nonlinear schemes, which are unconditionally positive and can preserve certain linear invariants [Patankar1980, BDM2003, MCD2020, KM18, AKM2020].

In this work, we focus on Patankar-type schemes. The main idea behind them is to modify an existing time-stepping method by introducing nonlinear weights in such a way that the resulting numerical scheme becomes unconditionally positive. The primary challenge lies in designing these weights so that the modified scheme preserves the accuracy of the original (baseline) method. This nonlinear modification is achieved using the so-called Patankar-trick [Patankar1980], which gives this family of methods its name. A notable example is the incorporation of modified Patankar (MP) weights into classical Runge–Kutta (RK) schemes, leading to the development of modified Patankar–Runge–Kutta (MPRK) methods [BDM2003, KM18, KM18Order3], which in addition to being unconditionally positive, are also conservative. Motivated by their strong numerical performance, the Patankar-trick has since been successfully extended to a variety of time integration frameworks, including SSP Runge–Kutta (SSPRK) methods [SSPMPRK2, SSPMPRK3], arbitrary high-order Deferred Correction (DeC) schemes [MPDeC], generalized BBKS methods [AKM2020], GeCo schemes [MCD2020], and linear multistep methods [IMPV2025]. The resulting modified schemes all belong to the broader Patankar-type family, which can themselves be recast as non-standard additive Runge–Kutta (NSARK) methods, see [NSARK, IzginThesis].

Concerning the preservation (conservation/dissipation) of functionals η\eta, several results are available for linear schemes such as RK methods applied to linear problems [tadmor2002semidiscrete, ranocha2018L2stability, sun2017stability, sun2019strong, achleitner2024necessary, tadmor2025stability, sun2022energy] and fully-implicit methods [lefloch2002fully, friedrich2019entropy, burrage1979stability, burrage1980nonlinear, dahlby2011preserving]. There are also positive results on dissipative schemes if the problem is sufficiently dissipative [higueras2005monotonicity, jungel2015entropy, jungel2017entropy]. In the general case including conservative problems, however, results are restrictive and include many negative results [ranocha2021strong, ranocha2020energy]. Similarly to positivity preservation, postprocessing/projection methods can be used to enforce the desired conservation/dissipation properties of time integration methods [Shampine1986, grimm2005geometric, calvo2006preservation, calvo2010projection, laburta2015numerical]. In this work, we focus on the relaxation approach [ketcheson2019relaxation, ranocha2020relaxation, ranocha2020general], which can be used to enforce conservation/dissipation of functionals while preserving all linear invariants. The basic idea of relaxation methods goes back to [sanzserna1982explicit] and [dekker1984stability, pp. 265–266].

Thus, there are several studies and methods devoted to either positivity preservation or the preservation of functionals such as entropy, but to the best of our knowledge, there is no high-order method that can guarantee both properties simultaneously. The main contribution of this work is to design a modified relaxation algorithm capable of simultaneously preserving positivity and conservation/dissipation of functionals. To that end, we first equip unconditionally positivity-preserving NSARK methods with suitable estimates for dissipative entropies by applying the relaxation framework from [ranocha2020general]. While relaxation can be rendered positivity-preserving for dissipative problems with minor adjustments (see Remark 3), entropy-conservative problems require more sophisticated treatment. Leveraging dense output formulae for MPRK methods [izgin2024], we propose a modified relaxation step that ensures unconditional positivity. Furthermore, we introduce a bootstrapping technique to achieve arbitrarily high-order accuracy in time for MPRK schemes.

The remainder of the paper is structured as follows. We recall the relaxation technique from [ranocha2020general] in Section 2.1. In Section 2.2 we give a brief introduction to NSARK methods. After that, we explain in Section 3.1 how to apply the relaxation algorithm for NSARK schemes and entropy dissipative problems. The main result is given for the entropy-conservative case, see Section 3.2, where we equip different families of MP schemes with a positivity-preserving relaxation algorithm and present a bootstrapping technique to obtain arbitrary high order (in time) for MPRK schemes. Finally, we present several examples of ordinary and partial differential equations and validate our findings for second- and third-order MP schemes.

2 Preliminaries

In this section, we briefly review relaxation methods to preserve functionals η\eta and non-standard additive Runge–Kutta (NSARK) methods, which includes Patankar-type methods as a special case.

2.1 Classical Relaxation

One way to guarantee dissipation (2) or conservation (3) of functionals η\eta is the relaxation procedure explained in [ranocha2020general]. We are given a numerical one-step method of order p2p\geq 2 generating approximations 𝐮n\mathbf{u}^{n} to 𝐮(tn)\mathbf{u}(t_{n}) with a time step size of Δt\Delta t. We then have to repeat the following steps, starting with n=0n=0.

  1. 1.

    Define the quantities (told,𝐮old,ηold)(tn,𝐮n,η(𝐮n))(t_{\text{old}},\mathbf{u}_{\text{old}},\eta_{\text{old}})\coloneqq(t_{n},\mathbf{u}^{n},\eta(\mathbf{u}^{n})) as well as (tnew,𝐮new)(tn+1,𝐮n+1)(t_{\text{new}},\mathbf{u}_{\text{new}})\coloneqq(t_{n+1},\mathbf{u}^{n+1}).

  2. 2.
    • For dissipative problems (1) compute a suitable estimate

      ηnew=η(𝐮new)+𝒪(Δtp+1),Δt0.\eta_{\text{new}}=\eta(\mathbf{u}_{\text{new}})+\mathcal{O}(\Delta t^{p+1}),\quad\Delta t\to 0.
    • For conservative problems we can simply set ηnewηold\eta_{\text{new}}\coloneqq\eta_{\text{old}}, since we arrive at ηnew=η(𝐮old)=η(𝐮(tn+1))=η(𝐮new)+𝒪(Δtp+1)\eta_{\text{new}}=\eta(\mathbf{u}_{\text{old}})=\eta(\mathbf{u}(t_{n+1}))=\eta(\mathbf{u}_{\text{new}})+\mathcal{O}(\Delta t^{p+1}) by means of an induction over nn.

  3. 3.

    Solve the system

    (tγn𝐮γnη(𝐮γn))=(told𝐮oldηold)+γ(tnewtold𝐮new𝐮oldηnewηold)\begin{pmatrix}t_{\gamma}^{n}\\ \mathbf{u}_{\gamma}^{n}\\ \eta(\mathbf{u}_{\gamma}^{n})\end{pmatrix}=\begin{pmatrix}[r]t_{\text{old}}\\ \mathbf{u}_{\text{old}}\\ \eta_{\text{old}}\end{pmatrix}+\gamma\begin{pmatrix}[r]t_{\text{new}}-t_{\text{old}}\\ \mathbf{u}_{\text{new}}-\mathbf{u}_{\text{old}}\\ \eta_{\text{new}}-\eta_{\text{old}}\end{pmatrix} (4)

    by inserting 𝐮γn\mathbf{u}_{\gamma}^{n} into the last equation and solving for γ1\gamma\approx 1, and then computing tγnt_{\gamma}^{n} and 𝐮γn\mathbf{u}_{\gamma}^{n} according to the remaining equations.

  4. 4.

    Proceed with the numerical scheme using tγnt_{\gamma}^{n} and 𝐮γn\mathbf{u}_{\gamma}^{n} instead of tn+1t_{n+1} and 𝐮n+1\mathbf{u}^{n+1}.

For dissipative problems, the “suitable estimate ηnew\eta_{\text{new}} must guarantee the discrete dissipativity (2) for the approximations from the relaxation procedure. We will introduce such a suitable estimate for NSARK methods that are based on ARK methods with a non-negative extended Butcher tableau in Section 3.1. For now, let us proceed by revisiting the main results from [ranocha2020general], assuming we have such an ηnew\eta_{\text{new}} at hand.

Theorem 1 ([ranocha2020general, Theorem 2.13, Theorem 2.14]).

Consider the relaxation procedure (4) with a numerical method of order pp and Δt>0\Delta t>0 sufficiently small. If

η is convex and η′′(𝐮old)(𝐟(𝐮old),𝐟(𝐮old))0 or \eta\text{ is convex and }\eta^{\prime\prime}(\mathbf{u}_{\text{old}})(\mathbf{f}(\mathbf{u}_{\text{old}}),\mathbf{f}(\mathbf{u}_{\text{old}}))\neq 0\quad\text{ or } (5a)
η(𝐮new)𝐮new𝐮old𝐮new𝐮old=c(𝐮old)Δt+𝒪(Δt2) with c0,\eta^{\prime}(\mathbf{u}_{\text{new}})\frac{\mathbf{u}_{\text{new}}-\mathbf{u}_{\text{old}}}{\|\mathbf{u}_{\text{new}}-\mathbf{u}_{\text{old}}\|}=c(\mathbf{u}_{\text{old}})\Delta t+\mathcal{O}(\Delta t^{2})\text{ with }c\neq 0, (5b)

then there exists a unique γ=1+𝒪(Δtp1)\gamma=1+\mathcal{O}(\Delta t^{p-1}) that satisfies (4). Additionally, the relaxation method is of order pp, that is 𝐮γn=𝐮(tγn)+𝒪(Δtp+1)\mathbf{u}_{\gamma}^{n}=\mathbf{u}(t_{\gamma}^{n})+\mathcal{O}(\Delta t^{p+1}). In particular, there exist γ1,γ2>0\gamma_{1},\gamma_{2}>0 such that

r(γ)η(𝐮old+γ(𝐮new𝐮old))(ηold+γ(ηnewηold))r(\gamma)\coloneqq\eta(\mathbf{u}_{\text{old}}+\gamma(\mathbf{u}_{\text{new}}-\mathbf{u}_{\text{old}}))-\left(\eta_{\text{old}}+\gamma(\eta_{\text{new}}-\eta_{\text{old}})\right) (6)

satisfies r(γ1)r(γ2)<0r(\gamma_{1})r(\gamma_{2})<0 for Δt>0\Delta t>0 small enough.

This theorem is the theoretical basis for the existence and uniqueness of the solution of the relaxation procedure (4). Unfortunately, the theorem does not give bounds on Δt\Delta t for the existence of the solution, so that computations may be rejected due to Δt\Delta t being too large.

Remark 1 (Issue with positivity).

The main issue of positivity-preservation with the above relaxation algorithm is that the update

𝐮γn=𝐮n+γ(𝐮n+1𝐮n)=γ𝐮n+1+(1γ)𝐮n\mathbf{u}^{n}_{\gamma}=\mathbf{u}^{n}+\gamma(\mathbf{u}^{n+1}-\mathbf{u}^{n})=\gamma\mathbf{u}^{n+1}+(1-\gamma)\mathbf{u}^{n}

is not necessarily positivity-preserving for γ>1\gamma>1, even if the baseline method is positive.

Nevertheless, to overcome this issue, we propose to use unconditionally positive111That is, 𝐮n>𝟎\mathbf{u}^{n}>\mathbf{0} component-wise implies 𝐮n+1>𝟎\mathbf{u}^{n+1}>\mathbf{0} for all Δt>0\Delta t>0. time integrators. In the upcoming sections, we introduce the methods of interest for this work, all of which may be recast as so-called non-standard additive Runge–Kutta schemes.

2.2 Non-standard Additive Runge–Kutta Methods

Non-standard additive Runge–Kutta methods (NSARK) methods are applied to an IVP (1), where the right-hand side is split into a sum, that is

𝐮(t)=𝐟(𝐮(t))=ν=1N𝐟[ν](𝐮(t)),𝐮(t0)\displaystyle\mathbf{u}^{\prime}(t)=\mathbf{f}(\mathbf{u}(t))=\sum_{\begin{subarray}{c}\nu=1\end{subarray}}^{N}\mathbf{f}^{[\nu]}(\mathbf{u}(t)),\quad\mathbf{u}(t_{0}) =𝐮0d.\displaystyle=\mathbf{u}^{0}\in\mathbb{R}^{d}. (7)

Already for traditional additive Runge–Kutta (ARK) methods, including Implicit-Explicit (IMEX) Runge–Kutta (RK) methods [Crouzeix1980, ARS1997], the main idea is to apply very different RK schemes determined by 𝐀[ν]=(aij[ν])i,j=1,,s\mathbf{A}^{[\nu]}=(a_{ij}^{[\nu]})_{i,j=1,\dotsc,s}, 𝐛[ν]=(b1[ν],,bs[ν])\mathbf{b}^{[\nu]}=(b_{1}^{[\nu]},\dotsc,b_{s}^{[\nu]}), 𝐜[ν]=(c1[ν],,cs[ν])T\mathbf{c}^{[\nu]}=(c_{1}^{[\nu]},\dotsc,c_{s}^{[\nu]})^{T} to the different addends 𝐟[ν]\mathbf{f}^{[\nu]}. For internal consistency, we require that the different RK schemes actually do not differ in the abscissa, i.e.

ci=ci[ν]=j=1saij[ν]c_{i}=c_{i}^{[\nu]}=\sum_{j=1}^{s}a_{ij}^{[\nu]} (8)

for i=1,,si=1,\dotsc,s and ν=1,,N\nu=1,\dotsc,N, see [SG2015]. However, for autonomous IVPs (7), this has no effect on the resulting ARK method, which in this case reads

𝐮(i)\displaystyle\mathbf{u}^{(i)} =𝐮n+Δtj=1sν=1Naij[ν]𝐟[ν](𝐮(j)),i=1,,s,\displaystyle=\mathbf{u}^{n}+\Delta t\sum_{j=1}^{s}\sum_{\begin{subarray}{c}\nu=1\end{subarray}}^{N}a^{[\nu]}_{ij}\mathbf{f}^{[\nu]}(\mathbf{u}^{(j)}),\quad i=1,\dotsc,s, (9)
𝐮n+1\displaystyle\mathbf{u}^{n+1} =𝐮n+Δtj=1sν=1Nbj[ν]𝐟[ν](𝐮(j)),\displaystyle=\mathbf{u}^{n}+\Delta t\sum_{j=1}^{s}\sum_{\begin{subarray}{c}\nu=1\end{subarray}}^{N}b^{[\nu]}_{j}\mathbf{f}^{[\nu]}(\mathbf{u}^{(j)}),

and the corresponding extended Butcher tableau is given by

𝐜𝐀[1]𝐀[2]𝐀[N]𝐛[1]𝐛[2]𝐛[N]\begin{array}[]{c|c|c|c|c}\mathbf{c}&\mathbf{A}^{[1]}&\mathbf{A}^{[2]}&\cdots&\mathbf{A}^{[N]}\\ \hline\cr&\mathbf{b}^{[1]}&\mathbf{b}^{[2]}&\cdots&\mathbf{b}^{[N]}\end{array}

with 𝐜=(c1,,cs)T\mathbf{c}=(c_{1},\dotsc,c_{s})^{T}.

NSARK methods now differ from ARK schemes (9) in that their extended Butcher tableau is allowed to also depend on the step size and the solution. In particular, NSARK methods applied to (7) are of the form

𝐮(i)\displaystyle\mathbf{u}^{(i)} =𝐮n+Δtj=1sν=1Naij[ν](𝐔n,tn,Δt)𝐟[ν](𝐮(j)),i=1,,s,\displaystyle=\mathbf{u}^{n}+\Delta t\sum_{j=1}^{s}\sum_{\begin{subarray}{c}\nu=1\end{subarray}}^{N}a^{[\nu]}_{ij}(\mathbf{U}^{n},t_{n},\Delta t)\mathbf{f}^{[\nu]}(\mathbf{u}^{(j)}),\quad i=1,\dotsc,s, (10)
𝐮n+1\displaystyle\mathbf{u}^{n+1} =𝐮n+Δtj=1sν=1Nbj[ν](𝐔n,tn,Δt)𝐟[ν](𝐮(j)),\displaystyle=\mathbf{u}^{n}+\Delta t\sum_{j=1}^{s}\sum_{\begin{subarray}{c}\nu=1\end{subarray}}^{N}b^{[\nu]}_{j}(\mathbf{U}^{n},t_{n},\Delta t)\mathbf{f}^{[\nu]}(\mathbf{u}^{(j)}),

where 𝐔n=(𝐮n\dabar@\dabar@\dabar@𝐮(1)\dabar@\dabar@\dabar@\dabar@\dabar@\dabar@𝐮(s)\dabar@\dabar@\dabar@𝐮n+1)d×s+2\mathbf{U}^{n}=(\mathbf{u}^{n}\rotatebox[origin={c}]{90.0}{$\dabar@\dabar@\dabar@$}\mathbf{u}^{(1)}\rotatebox[origin={c}]{90.0}{$\dabar@\dabar@\dabar@$}\dotsc\rotatebox[origin={c}]{90.0}{$\dabar@\dabar@\dabar@$}\mathbf{u}^{(s)}\rotatebox[origin={c}]{90.0}{$\dabar@\dabar@\dabar@$}\mathbf{u}^{n+1})\in\mathbb{R}^{d\times s+2}.

In the case of gBBKS [AKM2020], Geometric Conservative (GeCo) [MCD2020], both of which may be interpreted as NSRK schemes, as well as modified Patankar–Runge–Kutta (MPRK) [KM18, KM18Order3] methods, the same RK scheme is used for the treatment of the different addends in (7) and only the solution-dependent terms vary. For MP strong-stability-preserving RK (MPSSPRK) schemes, the situation is different, see Section 2.2.3. In this work we focus on modified Patankar (MP) schemes in the entropy-conservative case and leave gBBKS and GeCo methods for future works.

2.2.1 Production-Destruction-Rest Systems

The application of modified Patankar (MP) schemes is restricted to production-destruction-rest (PDRS) systems

uk(t)=rkP(𝐮(t))rkD(𝐮(t))+ν=1d(pkν(𝐮(t))dkν(𝐮(t))),k=1,,du_{k}^{\prime}(t)=r^{P}_{k}(\mathbf{u}(t))-r^{D}_{k}(\mathbf{u}(t))+\sum_{\nu=1}^{d}(p_{k\nu}(\mathbf{u}(t))-d_{k\nu}(\mathbf{u}(t))),\quad k=1,\dotsc,d

with pkν=dνkp_{k\nu}=d_{\nu k} and rkP,rkD,pkν,dkν0r^{P}_{k},r^{D}_{k},p_{k\nu},d_{k\nu}\geq 0 on >0d\mathbb{R}^{d}_{>0}. We note that this is only a formal restriction since every autonomous system with real-valued right-hand sides can be rewritten as such a PDRS [IR2023]. Now, one can recover the function 𝐟[ν]\mathbf{f}^{[\nu]} in (7) and specify the solution-dependent Butcher coefficients. Indeed, according to [IzginThesis, Remark 2.25], a PDRS can be written in terms of (7) by using the convention pkk=dkk=0p_{kk}=d_{kk}=0 and choosing N=d+1N=d+1 as well as

𝐟[N](𝐮(t))\displaystyle\mathbf{f}^{[N]}(\mathbf{u}(t)) =(r1P(𝐮(t)),,rdP(𝐮(t)))T\displaystyle=(r_{1}^{P}(\mathbf{u}(t)),\dotsc,r_{d}^{P}(\mathbf{u}(t)))^{T} (11)
fk[ν](𝐮(t))\displaystyle f^{[\nu]}_{k}(\mathbf{u}(t)) ={pkν(𝐮(t)),kν,(rkD(𝐮(t))+μ=1ddkμ(𝐮(t))),k=ν\displaystyle=

for k,ν=1,,dk,\nu=1,\dotsc,d.

2.2.2 Modified Patankar–Runge–Kutta Schemes

With (11), every MPRK scheme [BDM2003, KM18, KM18Order3] that is based on a single explicit RK method with a non-negative Butcher array can be expressed in terms of an NSARK scheme using

aij[ν](𝐔n,tn,Δt)\displaystyle a^{[\nu]}_{ij}(\mathbf{U}^{n},t_{n},\Delta t) =aijγν[i](𝐔n,tn,Δt),\displaystyle=a_{ij}\gamma_{\nu}^{[i]}(\mathbf{U}^{n},t_{n},\Delta t), (12)
bj[ν](𝐔n,tn,Δt)\displaystyle b^{[\nu]}_{j}(\mathbf{U}^{n},t_{n},\Delta t) =bjδν(𝐔n,tn,Δt),\displaystyle=b_{j}\delta_{\nu}(\mathbf{U}^{n},t_{n},\Delta t),

where

γν[i](𝐔n,tn,Δt)\displaystyle\gamma_{\nu}^{[i]}(\mathbf{U}^{n},t_{n},\Delta t) ={uν(i)πν(i)(𝐮n,𝐮(1),,𝐮(i1)),ν<N1,ν=N and\displaystyle=\,\,\text{ and } (13)
δν(𝐔n,tn,Δt)\displaystyle\delta_{\nu}(\mathbf{U}^{n},t_{n},\Delta t) ={uνn+1σν(𝐮n,𝐮(1),,𝐮(s)),ν<N1,ν=N\displaystyle=

are the so-called non-standard weights (NSWs). Here, πν(i)\pi_{\nu}^{(i)} and σν\sigma_{\nu} denote the so-called Patankar-weight denominators (PWDs) and can be chosen for the particular MPRK method to ensure stability and accuracy, see [KM18, IzginThesis] for more insights. If the Butcher array contains negative entries, more care is needed when defining the MPRK method, see e.g. [MPDeC].

Example 1 (Second-order Family).

The second-order family of MPRK schemes, denoted by MPRK22(α\alpha), is given by

uk(1)=\displaystyle u_{k}^{(1)}= ukn,\displaystyle\,u_{k}^{n},
uk(2)=\displaystyle u_{k}^{(2)}= ukn+αΔt(rkP(𝐮(1))+ν=1dpkν(𝐮(1))uν(2)uνn(rkD(𝐮(1))+ν=1ddkν(𝐮(1)))uk(2)ukn),\displaystyle\,u_{k}^{n}+\alpha\Delta t\left(r^{P}_{k}(\mathbf{u}^{(1)})+\sum_{\nu=1}^{d}p_{k\nu}(\mathbf{u}^{(1)})\frac{u_{\nu}^{(2)}}{u_{\nu}^{n}}-\left(r^{D}_{k}(\mathbf{u}^{(1)})+\sum_{\nu=1}^{d}d_{k\nu}(\mathbf{u}^{(1)})\right)\frac{u_{k}^{(2)}}{u_{k}^{n}}\right),
ukn+1=\displaystyle u_{k}^{n+1}= ukn+Δtj=12bj(rkP(𝐮(j))+ν=1dpkν(𝐮(j))uνn+1(uν(2))1α(uνn)11α\displaystyle\,u_{k}^{n}+\Delta t\sum_{j=1}^{2}b_{j}\left(r_{k}^{P}(\mathbf{u}^{(j)})+\sum_{\nu=1}^{d}p_{k\nu}(\mathbf{u}^{(j)})\frac{u_{\nu}^{n+1}}{(u_{\nu}^{(2)})^{\frac{1}{\alpha}}(u_{\nu}^{n})^{1-\frac{1}{\alpha}}}\right.
(rkD(𝐮(j))+ν=1ddkν(𝐮(j)))ukn+1(uk(2))1α(ukn)11α)\displaystyle\hphantom{u_{k}^{n}+\Delta t\sum_{j=1}^{2}b_{j}}-\left.\left(r^{D}_{k}(\mathbf{u}^{(j)})+\sum_{\nu=1}^{d}d_{k\nu}(\mathbf{u}^{(j)})\right)\frac{u_{k}^{n+1}}{(u_{k}^{(2)})^{\frac{1}{\alpha}}(u_{k}^{n})^{1-\frac{1}{\alpha}}}\right)

with k=1,,dk=1,\dotsc,d, α12\alpha\geq\frac{1}{2} and b2=12αb_{2}=\tfrac{1}{2\alpha} as well as b1=1b2b_{1}=1-b_{2}. In terms of the previous notation, we are using the Butcher array

0αα112α12α\displaystyle\begin{array}[]{c|cc}0&&\\ \alpha&\alpha&\\ \hline\cr&1-\frac{1}{2\alpha}&\frac{1}{2\alpha}\end{array}

and the PWDs

πk(2)=ukn,σk=(uk(2))1α(ukn)11α.\pi_{k}^{(2)}=u_{k}^{n},\quad\sigma_{k}=(u_{k}^{(2)})^{\frac{1}{\alpha}}(u_{k}^{n})^{1-\frac{1}{\alpha}}.

In this work, we will focus on α=1\alpha=1 as suggested by [IssuesMPRK].

Example 2 (Third-order Family).

There are two third-order families of MPRK schemes, see [KM18Order3]. One of them is based on the Butcher array

0ααβ3αβ(1α)β2α(23α)β(βα)α(23α)1+23(α+β)6αβ3β26α(βα)23α6β(βα)\displaystyle\begin{array}[]{c|ccc}0&&&\\ \alpha&\alpha&&\\ \beta&\frac{3\alpha\beta(1-\alpha)-\beta^{2}}{\alpha(2-3\alpha)}&\frac{\beta(\beta-\alpha)}{\alpha(2-3\alpha)}&\\ \hline\cr&1+\frac{2-3(\alpha+\beta)}{6\alpha\beta}&\frac{3\beta-2}{6\alpha(\beta-\alpha)}&\frac{2-3\alpha}{6\beta(\beta-\alpha)}\end{array} (14)

see [KM18Order3] for more details on the domain of α\alpha and β\beta. The PWDs are given by

πν(2)=\displaystyle\pi_{\nu}^{(2)}= uνn,πν(3)=(uν(2))1p(uνn)11p,\displaystyle u_{\nu}^{n},\qquad\pi_{\nu}^{(3)}=\,(u_{\nu}^{(2)})^{\frac{1}{p}}(u_{\nu}^{n})^{1-\frac{1}{p}}, (15)
σk=\displaystyle\sigma_{k}= ukn+Δtj=12βj(rkP(𝐮(j))+ν=1dpkν(𝐮(j))σν(uν(2))1a21(uνn)11a21\displaystyle u_{k}^{n}+\Delta t\sum_{j=1}^{2}\beta_{j}\left(r_{k}^{P}(\mathbf{u}^{(j)})+\sum_{\nu=1}^{d}p_{k\nu}(\mathbf{u}^{(j)})\frac{\sigma_{\nu}}{(u_{\nu}^{(2)})^{\frac{1}{a_{21}}}(u_{\nu}^{n})^{1-\frac{1}{a_{21}}}}\right.
(rkD(𝐮(j))+ν=1ddkν(𝐮(j)))σk(uk(2))1a21(ukn)11a21)\displaystyle\hskip 56.9055pt-\left.\left(r_{k}^{D}(\mathbf{u}^{(j)})+\sum_{\nu=1}^{d}d_{k\nu}(\mathbf{u}^{(j)})\right)\frac{\sigma_{k}}{(u_{k}^{(2)})^{\frac{1}{a_{21}}}(u_{k}^{n})^{1-\frac{1}{a_{21}}}}\right)

for ν,k=1,,d\nu,k=1,\dotsc,d, where β1=1β2\beta_{1}=1-\beta_{2}, β2=12a21\beta_{2}=\frac{1}{2a_{21}}, and p=3a21(a31+a32)b3p=3a_{21}\left(a_{31}+a_{32}\right)b_{3}. Note that 𝛔\bm{\sigma} requires the solution of another linear system, which is why this family is denoted by MPRK43I(α,β\alpha,\beta). In this work we focus on α=0.5\alpha=0.5 and β=0.75\beta=0.75.

In any case we point out that the schemes are implicit due to the numerators in (13). Indeed, they are linearly implicit as the PWDs πν(i)\pi_{\nu}^{(i)} and σν\sigma_{\nu} are required to be independent of the numerator [BDM2003, IzginThesis]. Consequently, an MPRK scheme can be written in matrix-vector notation as follows.

𝐌(i)𝐮(i)\displaystyle\mathbf{M}^{(i)}\mathbf{u}^{(i)} =𝐮n+Δtj=1i1aij𝐫P(𝐮(j)),i=1,,s,\displaystyle=\mathbf{u}^{n}+\Delta t\sum_{j=1}^{i-1}a_{ij}\mathbf{r}^{P}(\mathbf{u}^{(j)}),\quad i=1,\dotsc,s, (16)
𝐌𝐮n+1\displaystyle\mathbf{M}\mathbf{u}^{n+1} =𝐮n+Δtj=1sbj𝐫P(𝐮(j)),\displaystyle=\mathbf{u}^{n}+\Delta t\sum_{j=1}^{s}b_{j}\mathbf{r}^{P}(\mathbf{u}^{(j)}),

where 𝐫P=(r1P,,rdP)T\mathbf{r}^{P}=(r_{1}^{P},\dotsc,r_{d}^{P})^{T} and 𝐌(i)=(mkν(i))1k,νd\mathbf{M}^{(i)}=(m^{(i)}_{k\nu})_{1\leq k,\nu\leq d} with

mkk(i)\displaystyle m^{(i)}_{kk} =1+Δtj=1i1aij(rkD(𝐮(j))+ν=1ddkν(𝐮(j)))1πk(i),\displaystyle=1+\Delta t\sum_{j=1}^{i-1}a_{ij}\left(r_{k}^{D}(\mathbf{u}^{(j)})+\sum_{\nu=1}^{d}d_{k\nu}(\mathbf{u}^{(j)})\right)\frac{1}{\pi_{k}^{(i)}}, (17)
mkν(i)\displaystyle m^{(i)}_{k\nu} =Δtj=1i1aijpkν(𝐮(j))1πν(i),kν\displaystyle=-\Delta t\sum_{j=1}^{i-1}a_{ij}p_{k\nu}(\mathbf{u}^{(j)})\frac{1}{\pi_{\nu}^{(i)}},\quad k\neq\nu

as well as, using 𝐌=(mkν)1k,νd\mathbf{M}=(m_{k\nu})_{1\leq k,\nu\leq d},

mkk\displaystyle m_{kk} =1+Δtj=1sbj(rkD(𝐮(j))+ν=1ddkν(𝐮(j)))1σk,\displaystyle=1+\Delta t\sum_{j=1}^{s}b_{j}\left(r_{k}^{D}(\mathbf{u}^{(j)})+\sum_{\nu=1}^{d}d_{k\nu}(\mathbf{u}^{(j)})\right)\frac{1}{\sigma_{k}}, (18)
mkν\displaystyle m_{k\nu} =Δtj=1sbjpkν(𝐮(j))1σν,kν.\displaystyle=-\Delta t\sum_{j=1}^{s}b_{j}p_{k\nu}(\mathbf{u}^{(j)})\frac{1}{\sigma_{\nu}},\quad k\neq\nu.

2.2.3 MP Strong-Stability-Preserving-Runge–Kutta Schemes

Although there exist second- and third-order MPSSPRK schemes, see [SSPMPRK2, SSPMPRK3], we focus for simplicity on the second-order method and the conservative PDS case. The generalization to non-conservative PDRS is straightforward but complicates the formulae. Also, the consideration of third-order MPSSPRK schemes will be left out for future work. The two-parameter family of second-order MPSSPRK schemes from [SSPMPRK2] is given by

𝐮(1)=\displaystyle\mathbf{u}^{(1)}= 𝐮n,\displaystyle\mathbf{u}^{n}, (19)
ui(2)=\displaystyle u_{i}^{(2)}= uin+βΔt(j=1dpij(𝐮n)uj(2)ujnj=1ddij(𝐮n)ui(2)uin),\displaystyle u_{i}^{n}+\beta\Delta t\left(\sum_{j=1}^{d}p_{ij}(\mathbf{u}^{n})\frac{u_{j}^{(2)}}{u_{j}^{n}}-\sum_{j=1}^{d}d_{ij}(\mathbf{u}^{n})\frac{u_{i}^{(2)}}{u_{i}^{n}}\right),
uin+1=\displaystyle u_{i}^{n+1}={} (1α)uin+αui(2)+Δt(j=1d(β20pij(𝐮n)+β21pij(𝐮(2)))ujn+1(ujn)1s(uj(2))s\displaystyle(1-\alpha)u_{i}^{n}+\alpha u_{i}^{(2)}+\Delta t\Biggl(\sum_{j=1}^{d}\left(\beta_{20}p_{ij}(\mathbf{u}^{n})+\beta_{21}p_{ij}(\mathbf{u}^{(2)})\right)\frac{u_{j}^{n+1}}{(u_{j}^{n})^{1-s}(u_{j}^{(2)})^{s}}
j=1d(β20dij(𝐮n)+β21dij(𝐮(2)))uin+1(uin)1s(ui(2))s),\displaystyle-\sum_{j=1}^{d}\left(\beta_{20}d_{ij}(\mathbf{u}^{n})+\beta_{21}d_{ij}(\mathbf{u}^{(2)})\right)\frac{u_{i}^{n+1}}{(u_{i}^{n})^{1-s}(u_{i}^{(2)})^{s}}\Biggr),

where β20=112βαβ\beta_{20}=1-\frac{1}{2\beta}-\alpha\beta, β21=12β\beta_{21}=\frac{1}{2\beta} and s=1αβ+αβ2β(1αβ)s=\frac{1-\alpha\beta+\alpha\beta^{2}}{\beta(1-\alpha\beta)}. There, the free parameters α\alpha and β\beta are subject to

0α1,β>0,αβ+12β1.0\leq\alpha\leq 1,\quad\beta>0,\quad\alpha\beta+\frac{1}{2\beta}\leq 1. (20)

We refer to the above scheme as MPSSPRK2(α,β\alpha,\beta). For numerical experiments we use α=12\alpha=\frac{1}{2} and β=1\beta=1 [SSPMPRK2].

Substituting the second stage into the update, we can collect production and destruction terms. Hence, in the notation of (11), the solution-dependent coefficients for the conservative PDS case are

a21[ν](𝐔n,tn,Δt)\displaystyle a_{21}^{[\nu]}(\mathbf{U}^{n},t_{n},\Delta t) =βuν(2)uνn\displaystyle=\beta\frac{u_{\nu}^{(2)}}{u^{n}_{\nu}} (21)
b1[ν](𝐔n,tn,Δt)\displaystyle b_{1}^{[\nu]}(\mathbf{U}^{n},t_{n},\Delta t) =αβuν(2)uνn+β20uνn+1σν,b2[ν](𝐔n,tn,Δt)=β21uνn+1σν,\displaystyle=\alpha\beta\frac{u_{\nu}^{(2)}}{u^{n}_{\nu}}+\beta_{20}\frac{u_{\nu}^{n+1}}{\sigma_{\nu}},\quad b_{2}^{[\nu]}(\mathbf{U}^{n},t_{n},\Delta t)=\beta_{21}\frac{u_{\nu}^{n+1}}{\sigma_{\nu}},

where σν=(uνn)1s(uν(2))s\sigma_{\nu}=(u_{\nu}^{n})^{1-s}(u_{\nu}^{(2)})^{s}.

3 Positivity-Preserving Relaxation Technique

In what follows, we adapt the classical relaxation algorithm from Section 2.1 such that it becomes positivity-preserving.

3.1 Entropy Dissipative Case

First of all, we present a suitable estimate ηnew\eta_{\text{new}} for a general NSARK scheme. In order to minimize the computational effort, we propose to re-use the computed stage values of the NSARK scheme satisfying (12), that is

𝐮(i)\displaystyle\mathbf{u}^{(i)} =𝐮n+Δtj=1sν=1Naijγν[i](𝐔n,tn,Δt)𝐟[ν](𝐮(j)),i=1,,s,\displaystyle=\mathbf{u}^{n}+\Delta t\sum_{j=1}^{s}\sum_{\begin{subarray}{c}\nu=1\end{subarray}}^{N}a_{ij}\gamma^{[i]}_{\nu}(\mathbf{U}^{n},t_{n},\Delta t)\mathbf{f}^{[\nu]}(\mathbf{u}^{(j)}),\quad i=1,\dotsc,s, (22)
𝐮n+1\displaystyle\mathbf{u}^{n+1} =𝐮n+Δtj=1sν=1Nbjδν(𝐔n,tn,Δt)𝐟[ν](𝐮(j)),\displaystyle=\mathbf{u}^{n}+\Delta t\sum_{j=1}^{s}\sum_{\begin{subarray}{c}\nu=1\end{subarray}}^{N}b_{j}\delta_{\nu}(\mathbf{U}^{n},t_{n},\Delta t)\mathbf{f}^{[\nu]}(\mathbf{u}^{(j)}),
ηnew\displaystyle\eta_{\text{new}} =η(𝐮n)+Δtj=1sbj(η𝐟)(𝐮(j)),\displaystyle=\eta(\mathbf{u}^{n})+\Delta t\sum_{j=1}^{s}b_{j}(\eta^{\prime}\mathbf{f})(\mathbf{u}^{(j)}),

which can be interpreted as computing the numerical approximation of the augmented system

ddt(𝐮(t)η(𝐮(t)))=ν=1N(𝐟[ν](𝐮(t))0)𝐟^[ν](𝐮(t))+(𝟎(η𝐟)(𝐮(t)))𝐟^[N+1](𝐮(t))\frac{\mathrm{d}}{\mathrm{d}t}\begin{pmatrix}\mathbf{u}(t)\\ \eta(\mathbf{u}(t))\end{pmatrix}=\sum_{\begin{subarray}{c}\nu=1\end{subarray}}^{N}\underbrace{\begin{pmatrix}\mathbf{f}^{[\nu]}(\mathbf{u}(t))\\ 0\end{pmatrix}}_{\eqqcolon\hat{\mathbf{f}}^{[\nu]}(\mathbf{u}(t))}+\underbrace{\begin{pmatrix}\mathbf{0}\\ (\eta^{\prime}\mathbf{f})(\mathbf{u}(t))\end{pmatrix}}_{\eqqcolon\hat{\mathbf{f}}^{[N+1]}(\mathbf{u}(t))}

using an NSARK method with the extended Butcher tableau

𝐜𝚪1(𝐔n,tn,Δt)𝐀𝚪2(𝐔n,tn,Δt)𝐀𝚪N(𝐔n,tn,Δt)𝐀𝐀δ1(𝐔n,tn,Δt)𝐛δ2(𝐔n,tn,Δt)𝐛δN(𝐔n,tn,Δt)𝐛𝐛,\begin{array}[]{c|c|c|c|c|c}\mathbf{c}&\bm{\Gamma}_{1}(\mathbf{U}^{n},t_{n},\Delta t)\mathbf{A}&\bm{\Gamma}_{2}(\mathbf{U}^{n},t_{n},\Delta t)\mathbf{A}&\cdots&\bm{\Gamma}_{N}(\mathbf{U}^{n},t_{n},\Delta t)\mathbf{A}&\mathbf{A}\\ \hline\cr&\delta_{1}(\mathbf{U}^{n},t_{n},\Delta t)\mathbf{b}&\delta_{2}(\mathbf{U}^{n},t_{n},\Delta t)\mathbf{b}&\cdots&\delta_{N}(\mathbf{U}^{n},t_{n},\Delta t)\mathbf{b}&\mathbf{b}\end{array}, (23)

where 𝚪νdiag(γν[1],,γν[s])\bm{\Gamma}_{\nu}\coloneqq\operatorname{diag}(\gamma^{[1]}_{\nu},\dotsc,\gamma^{[s]}_{\nu}). Assuming that the two corresponding base methods described by the Butcher tableaux

𝐜𝚪1(𝐔n,tn,Δt)𝐀𝚪2(𝐔n,tn,Δt)𝐀𝚪N(𝐔n,tn,Δt)𝐀δ1(𝐔n,tn,Δt)𝐛δ2(𝐔n,tn,Δt)𝐛δN(𝐔n,tn,Δt)𝐛 and 𝐜𝐀𝐛\begin{array}[]{c|c|c|c|c}\mathbf{c}&\bm{\Gamma}_{1}(\mathbf{U}^{n},t_{n},\Delta t)\mathbf{A}&\bm{\Gamma}_{2}(\mathbf{U}^{n},t_{n},\Delta t)\mathbf{A}&\cdots&\bm{\Gamma}_{N}(\mathbf{U}^{n},t_{n},\Delta t)\mathbf{A}\\ \hline\cr&\delta_{1}(\mathbf{U}^{n},t_{n},\Delta t)\mathbf{b}&\delta_{2}(\mathbf{U}^{n},t_{n},\Delta t)\mathbf{b}&\cdots&\delta_{N}(\mathbf{U}^{n},t_{n},\Delta t)\mathbf{b}\end{array}\quad\text{ and }\quad\begin{array}[]{c|c}\mathbf{c}&\mathbf{A}\\ \hline\cr&\mathbf{b}\end{array}

both are of pp-th order for some p{2,3,4}p\in\{2,3,4\}, it can be seen from [NSARK, Theorem 18,Lemma 25,Lemma 26] that the overall scheme (22) is of order pp, since the respective order conditions are decoupled with respect to the columns of the tableau (23).

Remark 2.

The NSWs from MPSSPRK schemes, see (21), also satisfy (12) after multiplying and dividing by the respective Butcher coefficient.

As a result, we indeed obtain that ηnew=η(𝐮new)+𝒪(Δtp+1)\eta_{\text{new}}=\eta(\mathbf{u}_{\text{new}})+\mathcal{O}(\Delta t^{p+1}), and additionally,

ηnewη(𝐮n)\eta_{\text{new}}\leq\eta(\mathbf{u}^{n}) (24)

whenever bj0b_{j}\geq 0 for j=1,,sj=1,\dotsc,s. If bj<0b_{j}<0 for some j=1,,sj=1,\dotsc,s, one can still use Gauß  quadrature, as suggested in [ranocha2020general, Page 866] together with the unconditionally positive dense output formulae derived in [izgin2024] to obtain the approximations needed for the quadrature formula.

In view of Remark 1, the relaxation technique is in danger of not being positivity-preserving for γ>1\gamma>1. The following corollary gives a work around for dissipative problems as we will discuss in the upcoming Remark 3.

Corollary 1 ([ranocha2020general, Pages 882-883]).

If η\eta is convex with η′′(𝐮old)(𝐟(𝐮old),𝐟(𝐮old))0\eta^{\prime\prime}(\mathbf{u}_{\text{old}})(\mathbf{f}(\mathbf{u}_{\text{old}}),\mathbf{f}(\mathbf{u}_{\text{old}}))\neq 0 then rr from (6) is convex and satisfies r(0)=0r(0)=0, r(0)<0r^{\prime}(0)<0 and r(γ)>0r^{\prime}(\gamma)>0 for all γ1\gamma\geq 1 and Δt>0\Delta t>0 small enough.

Remark 3 (Positivity-preserving relaxation for convex η\eta).

Suppose that γ>1\gamma^{*}>1 is the solution to (4), i.e. r(γ)=0r(\gamma^{*})=0, so that the positivity of the relaxation update 𝐮γn\mathbf{u}_{\gamma}^{n} is not guaranteed any longer. Because of Corollary 1 we know that r(γ)r(γ)=0r(\gamma)\leq r(\gamma^{*})=0 for all γ[1,γ]\gamma\in[1,\gamma^{*}]. In particular r(1)0r(1)\leq 0 follows, i. e., for γ=1\gamma=1 we obtain from (6) the relation

η(𝐮γn)=η(𝐮new)ηnew.\eta(\mathbf{u}_{\gamma}^{n})=\eta(\mathbf{u}_{\text{new}})\leq\eta_{\text{new}}.

This means, that due to (24) only more dissipation will be introduced by using

γ=min{γ,1}(0,1],\gamma=\min\{\gamma^{*},1\}\in(0,1],

where γ\gamma^{*} is the solution to (4). But with that choice, 𝐮γn\mathbf{u}_{\gamma}^{n} is again a convex combination of positive data, and hence, positivity preservation is guaranteed for dissipative problems with a convex η\eta.

3.2 Entropy-Conservative Case

As 𝐮γn\mathbf{u}_{\gamma}^{n} in (4) is not guaranteed to be positivity-preserving for γ>1\gamma>1, see Remark 1, we propose to replace the update formula by a positivity-preserving variant. To indicate this difference in our notation we will write 𝐮n+γ\mathbf{u}^{n+\gamma} for a positivity-preserving approximation rather than 𝐮γn\mathbf{u}_{\gamma}^{n}.

3.2.1 Explicit Positivity-Preserving Procedure

If we are only interested in preserving positivity, a single nonlinear invariant but no further linear invariants, we may apply a Patankar–Runge–Kutta method to guarantee the positivity of the update and combine it with the geometric mean

𝐮n+γ=(𝐮n+1)γ(𝐮n)1γ.\mathbf{u}^{n+\gamma}=(\mathbf{u}^{n+1})^{\gamma}(\mathbf{u}^{n})^{1-\gamma}. (25)

In logarithmic variables, this reduces to

ln(𝐮n+γ)=ln(𝐮n)+γ(ln(𝐮n+1)ln(𝐮n)),\ln(\mathbf{u}^{n+\gamma})=\ln(\mathbf{u}^{n})+\gamma(\ln(\mathbf{u}^{n+1})-\ln(\mathbf{u}^{n})),

where we can find a solution to the relaxation problem in logarithmic variables according to the classical theory. Now, if η\eta is convex and non-decreasing in each argument the composition ηexp\eta\circ\exp is also convex [boyd2004convex, Section 2.3.4], and hence, also

η(𝐮n+γ)=ηold\eta(\mathbf{u}^{n+\gamma})=\eta_{\text{old}}

possesses a positive solution γ=1+𝒪(Δtp1)\gamma=1+\mathcal{O}(\Delta t^{p-1}).

3.2.2 Implicit Positivity-Preserving Procedure for Conservative PDS

One possible candidate for computing 𝐮n+γ\mathbf{u}^{n+\gamma} is to use dense output formulae recently developed in [izgin2024], which we briefly recall in the upcoming section.

Positivity-Preserving Dense Output

We first focus on Runge–Kutta methods and the MP variant, but the ideas can be carried out for MPSSPRK schemes in a straightforward manner as we will see.

The main idea is to replace bjb_{j}\in\mathbb{R} by a function b¯j:[0,1]\bar{b}_{j}\colon[0,1]\to\mathbb{R} such that

ukn+γ=ukn+Δtj=1sb¯j(γ)ν=1d(rkP(𝐮(j))rkD(𝐮(j))+pkν(𝐮(j))dkν(𝐮(j)))u^{n+\gamma}_{k}=u^{n}_{k}+\Delta t\sum_{j=1}^{s}\bar{b}_{j}(\gamma)\sum_{\nu=1}^{d}\left(r_{k}^{P}(\mathbf{u}^{(j)})-r_{k}^{D}(\mathbf{u}^{(j)})+p_{k\nu}(\mathbf{u}^{(j)})-d_{k\nu}(\mathbf{u}^{(j)})\right)

approximates uk(tn+γΔt)u_{k}(t^{n}+\gamma\Delta t). We impose

b¯j(0)=0andb¯j(1)=bj\bar{b}_{j}(0)=0\quad\text{and}\quad\bar{b}_{j}(1)=b_{j}

to recover

𝐮n+γ={𝐮n,γ=0,𝐮n+1,γ=1.\mathbf{u}^{n+\gamma}=\begin{cases}\mathbf{u}^{n},&\gamma=0,\\ \mathbf{u}^{n+1},&\gamma=1.\end{cases}
Example 3 (Second-order dense output for MPRK22(α\alpha)).

Using b¯j(γ)=γbj\bar{b}_{j}(\gamma)=\gamma b_{j} and

ukn+γ=ukn+Δtj=1sb¯j(γ)(rkP(𝐮(j))+ν=1dpkν(𝐮(j))uνn+1σν(rkD(𝐮(j))+ν=1ddkν(𝐮(j)))ukn+1σk)u^{n+\gamma}_{k}=u_{k}^{n}+\Delta t\sum_{j=1}^{s}\bar{b}_{j}(\gamma)\left(r_{k}^{P}(\mathbf{u}^{(j)})+\sum_{\nu=1}^{d}p_{k\nu}(\mathbf{u}^{(j)})\frac{u_{\nu}^{n+1}}{\sigma_{\nu}}-\left(r^{D}_{k}(\mathbf{u}^{(j)})+\sum_{\nu=1}^{d}d_{k\nu}(\mathbf{u}^{(j)})\right)\frac{u_{k}^{n+1}}{\sigma_{k}}\right)

yields a positivity-preserving dense output. Indeed, we find 𝐮n+γ=(1γ)𝐮n+γ𝐮n+1\mathbf{u}^{n+\gamma}=(1-\gamma)\mathbf{u}^{n}+\gamma\mathbf{u}^{n+1} in this case, which coincides with the relaxation update. For γ>1\gamma>1 this is not necessarily positivity-preserving. For our purposes, we want to ensure positivity even for γ>1\gamma>1, which can be done using

ukn+γ=ukn+Δtj=1sb¯j(γ)(rkP(𝐮(j))+ν=1dpkν(𝐮(j))uνn+γσ¯ν(γ)(rkD(𝐮(j))+ν=1ddkν(𝐮(j)))ukn+γσ¯k(γ))u^{n+\gamma}_{k}=u_{k}^{n}+\Delta t\sum_{j=1}^{s}\bar{b}_{j}(\gamma)\left(r_{k}^{P}(\mathbf{u}^{(j)})+\sum_{\nu=1}^{d}p_{k\nu}(\mathbf{u}^{(j)})\frac{u_{\nu}^{n+\gamma}}{\bar{\sigma}_{\nu}(\gamma)}-\left(r^{D}_{k}(\mathbf{u}^{(j)})+\sum_{\nu=1}^{d}d_{k\nu}(\mathbf{u}^{(j)})\right)\frac{u_{k}^{n+\gamma}}{\bar{\sigma}_{k}(\gamma)}\right) (26)

together with

σ¯k(γ)=σk=(uk(2))1α(ukn)11α\bar{\sigma}_{k}(\gamma)=\sigma_{k}=(u_{k}^{(2)})^{\frac{1}{\alpha}}(u_{k}^{n})^{1-\frac{1}{\alpha}} (27)

or

σ¯k(γ)=(uk(2))γα(ukn)1γα.\bar{\sigma}_{k}(\gamma)=(u_{k}^{(2)})^{\frac{\gamma}{\alpha}}(u_{k}^{n})^{1-\frac{\gamma}{\alpha}}. (28)
Example 4 (Higher order positive dense output for MPRK).

In general, one may use b¯j(γ)\bar{b}_{j}(\gamma) from the classical dense output formula paired with the update (26). Then, the only quantity to define is 𝛔¯(γ)\bar{\bm{\sigma}}(\gamma). According to [izgin2024], it is sufficient to use a lower order dense output MPRK scheme for the computation of 𝛔¯(γ)\bar{\bm{\sigma}}(\gamma). For instance, we note that third-order MPRK schemes are equipped with

b¯1(γ)=γ(1b1)γ2,b¯j(γ)=γ2bj,j=2,,s\bar{b}_{1}(\gamma)=\gamma-(1-b_{1})\gamma^{2},\quad\bar{b}_{j}(\gamma)=\gamma^{2}b_{j},\quad j=2,\dotsc,s

for the dense output. However, we will discuss a different approach in this work, and thus, omit to also recall 𝛔¯\bar{\bm{\sigma}} from [izgin2024].

Example 5 (Second-order dense output for MPSSPRK).

Let us incorporate the γ\gamma-dependency in (21), which gives

b1[ν](𝐔n,tn,Δt,γ)\displaystyle b_{1}^{[\nu]}(\mathbf{U}^{n},t_{n},\Delta t,\gamma) =γ(αβuν(2)uνn+β20uνn+γσ¯ν(γ)),b2[ν](𝐔n,tn,Δt,γ)=γβ21uνn+γσ¯ν(γ),\displaystyle=\gamma\left(\alpha\beta\frac{u_{\nu}^{(2)}}{u^{n}_{\nu}}+\beta_{20}\frac{u_{\nu}^{n+\gamma}}{\bar{\sigma}_{\nu}(\gamma)}\right),\quad b_{2}^{[\nu]}(\mathbf{U}^{n},t_{n},\Delta t,\gamma)=\gamma\beta_{21}\frac{u_{\nu}^{n+\gamma}}{\bar{\sigma}_{\nu}(\gamma)},

where we restrict to the choice σ¯ν(γ)=(uνn)1γs(uν(2))γs\bar{\sigma}_{\nu}(\gamma)=(u_{\nu}^{n})^{1-\gamma s}(u_{\nu}^{(2)})^{\gamma s}. Then

𝐮n+γ=𝐮n+Δtj=1sν=1dbj(𝐔n,tn,Δt,γ)𝐟[ν](𝐮(j)).\mathbf{u}^{n+\gamma}=\mathbf{u}^{n}+\Delta t\sum_{j=1}^{s}\sum_{\begin{subarray}{c}\nu=1\end{subarray}}^{d}b_{j}(\mathbf{U}^{n},t_{n},\Delta t,\gamma)\mathbf{f}^{[\nu]}(\mathbf{u}^{(j)}).
Remark 4 (Use of dense output for relaxation).

As we will show, we can use the dense output from Example 3 for the relaxation algorithm. However, proving the existence of a solution to the relaxation equation for (27) is more complex than for (28), which is due to the respective truncation errors. Moreover, as illustrated in Example 4, the bootstrapping for higher order positive dense output involves higher degree polynomials for b¯j\bar{b}_{j}, which may not be positive for γ>1\gamma>1. This is crucial since the solvability for the linear systems (16) relies on positive Butcher coefficients. While we could implement a trick [MPDeC] to overcome this issue, we rather focus on a different bootstrapping approach to keep the overall algorithm simple.

Preparatory Results for MPRK22(α\alpha)

We proceed to develop a relaxation technique using (26)-(27), which is more complicated than using (28) but on the other hand motivates us to derive more general results. To that end, we note that the scheme with (26)-(27) can be written in matrix-vector notation as

𝐮(1)\displaystyle\mathbf{u}^{(1)} =𝐮n\displaystyle=\mathbf{u}^{n} (29)
𝐌(2)(𝐮n)𝐮(2)\displaystyle\mathbf{M}^{(2)}(\mathbf{u}^{n})\mathbf{u}^{(2)} =𝐮n+αΔt𝐫P(𝐮n)\displaystyle=\mathbf{u}^{n}+\alpha\Delta t\mathbf{r}^{P}(\mathbf{u}^{n})
𝐌γ(𝐮n)𝐮n+γ\displaystyle\mathbf{M}_{\gamma}(\mathbf{u}^{n})\mathbf{u}^{n+\gamma} =𝐮n+γΔtj=1sbj𝐫P(𝐮(j)),γ>0,\displaystyle=\mathbf{u}^{n}+\gamma\Delta t\sum_{j=1}^{s}b_{j}\mathbf{r}^{P}(\mathbf{u}^{(j)}),\quad\gamma>0,

where 𝐌(2)\mathbf{M}^{(2)} can be obtained from (17) and

𝐌γ=γ(𝐌𝐈)+𝐈\mathbf{M}_{\gamma}=\gamma(\mathbf{M}-\mathbf{I})+\mathbf{I} (30)

with 𝐌\mathbf{M} from (18).

Finally, the relaxation step (4) for entropy-conservative problems is now updated to

(tγn𝐌γ(𝐮old)𝐮n+γη(𝐮n+γ))=(told𝐮oldηold)+γ(tnewtoldΔtj=12bj𝐫P(𝐮(j))0),\begin{pmatrix}t_{\gamma}^{n}\\ \mathbf{M}_{\gamma}(\mathbf{u}_{\text{old}})\mathbf{u}^{n+\gamma}\\ \eta(\mathbf{u}^{n+\gamma})\end{pmatrix}=\begin{pmatrix}[r]t_{\text{old}}\\ \mathbf{u}_{\text{old}}\\ \eta_{\text{old}}\end{pmatrix}+\gamma\begin{pmatrix}[r]t_{\text{new}}-t_{\text{old}}\\ \Delta t\sum_{j=1}^{2}b_{j}\mathbf{r}^{P}(\mathbf{u}^{(j)})\\ 0\end{pmatrix}, (31)

resulting in a coupled linear-nonlinear system for the simultaneous computation of γ\gamma and 𝐮n+γ\mathbf{u}^{n+\gamma}. Note that if such a γ>0\gamma>0 exists, the relaxation method for MPRK22(α\alpha) naturally is of the correct order for all γ\gamma as we are using an appropriate dense output formula.

Since we allow for a truncation error of 𝒪(Δt3)\mathcal{O}(\Delta t^{3}) for the second-order MPRK22(α(\alpha) scheme, it is beneficial to prove the following

Lemma 1.

If b¯j(γ)=γbj\bar{b}_{j}(\gamma)=\gamma b_{j} and σ¯k(γ)=σk\bar{\sigma}_{k}(\gamma)=\sigma_{k}, then the MPRK22(α\alpha) scheme (26) satisfies

𝐮n+γ=𝐮n+1+(γ1)Δt𝐝γn+𝒪(Δt3),\mathbf{u}^{n+\gamma}=\mathbf{u}^{n+1}+(\gamma-1)\Delta t\mathbf{d}^{n}_{\gamma}+\mathcal{O}(\Delta t^{3}), (32)

where

dγ,kn\displaystyle d^{n}_{\gamma,k} =ukn+1uknΔt+Δtγj=1sbj(ν=1dpkν(𝐮(j))fν(𝐮n)uνn(rkD(𝐮(j))+ν=1ddkν(𝐮(j)))fk(𝐮n)ukn)\displaystyle=\frac{u_{k}^{n+1}-u_{k}^{n}}{\Delta t}+\Delta t\gamma\sum_{j=1}^{s}b_{j}\left(\sum_{\nu=1}^{d}p_{k\nu}(\mathbf{u}^{(j)})\frac{f_{\nu}(\mathbf{u}^{n})}{u_{\nu}^{n}}-\left(r_{k}^{D}(\mathbf{u}^{(j)})+\sum_{\nu=1}^{d}d_{k\nu}(\mathbf{u}^{(j)})\right)\frac{f_{k}(\mathbf{u}^{n})}{u_{k}^{n}}\right) (33)
=fk(𝐮n)+𝒪(Δt).\displaystyle=f_{k}(\mathbf{u}^{n})+\mathcal{O}(\Delta t).
Proof.

Utilizing [KM2019, Lemma 2, Lemma 3], we observe

uνn+γσ¯ν(γ)=uνn+γ(uν(2))1α(uνn)11α=1+(γ1)Δtfν(𝐮n)uνn+𝒪(Δt2)=uνn+1σν+(γ1)Δtfν(𝐮n)uνn+𝒪(Δt2)\frac{u_{\nu}^{n+\gamma}}{\bar{\sigma}_{\nu}(\gamma)}=\frac{u_{\nu}^{n+\gamma}}{(u_{\nu}^{(2)})^{\frac{1}{\alpha}}(u_{\nu}^{n})^{1-\frac{1}{\alpha}}}=1+(\gamma-1)\Delta t\frac{f_{\nu}(\mathbf{u}^{n})}{u_{\nu}^{n}}+\mathcal{O}(\Delta t^{2})=\frac{u_{\nu}^{n+1}}{\sigma_{\nu}}+(\gamma-1)\Delta t\frac{f_{\nu}(\mathbf{u}^{n})}{u_{\nu}^{n}}+\mathcal{O}(\Delta t^{2}) (34)

as uνn+1σν=1+𝒪(Δt2)\frac{u_{\nu}^{n+1}}{\sigma_{\nu}}=1+\mathcal{O}(\Delta t^{2}) [NSARK]. Substituting this into (26) we receive

ukn+γ\displaystyle u^{n+\gamma}_{k} =ukn+γΔtj=1sbj(rkP(𝐮(j))+ν=1dpkν(𝐮(j))uνn+γσ¯ν(γ)(rkD(𝐮(j))+ν=1ddkν(𝐮(j)))ukn+γσ¯k(γ))\displaystyle=u_{k}^{n}+\gamma\Delta t\sum_{j=1}^{s}b_{j}\left(r_{k}^{P}(\mathbf{u}^{(j)})+\sum_{\nu=1}^{d}p_{k\nu}(\mathbf{u}^{(j)})\frac{u_{\nu}^{n+\gamma}}{\bar{\sigma}_{\nu}(\gamma)}-\left(r^{D}_{k}(\mathbf{u}^{(j)})+\sum_{\nu=1}^{d}d_{k\nu}(\mathbf{u}^{(j)})\right)\frac{u_{k}^{n+\gamma}}{\bar{\sigma}_{k}(\gamma)}\right)
=ukn+γ(ukn+1ukn)\displaystyle=u_{k}^{n}+\gamma(u_{k}^{n+1}-u_{k}^{n})
+γ(γ1)Δt2j=1sbj(ν=1dpkν(𝐮(j))fν(𝐮n)uνn(rkD(𝐮(j))+ν=1ddkν(𝐮(j)))fk(𝐮n)ukn)+𝒪(Δt3)\displaystyle+\gamma(\gamma-1)\Delta t^{2}\sum_{j=1}^{s}b_{j}\left(\sum_{\nu=1}^{d}p_{k\nu}(\mathbf{u}^{(j)})\frac{f_{\nu}(\mathbf{u}^{n})}{u_{\nu}^{n}}-\left(r^{D}_{k}(\mathbf{u}^{(j)})+\sum_{\nu=1}^{d}d_{k\nu}(\mathbf{u}^{(j)})\right)\frac{f_{k}(\mathbf{u}^{n})}{u_{k}^{n}}\right)+\mathcal{O}(\Delta t^{3})
=ukn+1+(γ1)(ukn+1ukn)\displaystyle=u_{k}^{n+1}+(\gamma-1)(u_{k}^{n+1}-u_{k}^{n})
+γ(γ1)Δt2j=1sbj(ν=1dpkν(𝐮(j))fν(𝐮n)uνn(rkD(𝐮(j))+ν=1ddkν(𝐮(j)))fk(𝐮n)ukn)+𝒪(Δt3)\displaystyle+\gamma(\gamma-1)\Delta t^{2}\sum_{j=1}^{s}b_{j}\left(\sum_{\nu=1}^{d}p_{k\nu}(\mathbf{u}^{(j)})\frac{f_{\nu}(\mathbf{u}^{n})}{u_{\nu}^{n}}-\left(r^{D}_{k}(\mathbf{u}^{(j)})+\sum_{\nu=1}^{d}d_{k\nu}(\mathbf{u}^{(j)})\right)\frac{f_{k}(\mathbf{u}^{n})}{u_{k}^{n}}\right)+\mathcal{O}(\Delta t^{3})
=ukn+1+(γ1)Δtdγ,kn+𝒪(Δt3).\displaystyle=u_{k}^{n+1}+(\gamma-1)\Delta td^{n}_{\gamma,k}+\mathcal{O}(\Delta t^{3}).

Remark 5 (Influence of 𝝈¯\bar{\bm{\sigma}} and application to MPSSPRK).

In the situation of Lemma 1, if we use (28) rather than (27), then instead of (34) we obtain

uνn+γσ¯ν(γ)=1+𝒪(Δt2)=uνn+1σν+𝒪(Δt2)\frac{u_{\nu}^{n+\gamma}}{\bar{\sigma}_{\nu}(\gamma)}=1+\mathcal{O}(\Delta t^{2})=\frac{u_{\nu}^{n+1}}{\sigma_{\nu}}+\mathcal{O}(\Delta t^{2})

using the same technique, and finally

𝐮n+γ=𝐮n+1+(γ1)Δt𝐮n+1𝐮nΔt𝐝n+𝒪(Δt3).\mathbf{u}^{n+\gamma}=\mathbf{u}^{n+1}+(\gamma-1)\Delta t\underbrace{\frac{\mathbf{u}^{n+1}-\mathbf{u}^{n}}{\Delta t}}_{\eqqcolon\mathbf{d}^{n}}+\mathcal{O}(\Delta t^{3}). (35)

Here 𝐝γn=𝐝n\mathbf{d}^{n}_{\gamma}=\mathbf{d}^{n} is independent of γ\gamma. We note that the PWDs for MPSSPRK2 are similar to the MPRK case, see Section 2.2.3. Indeed, one can show that (35) also holds for the second-order MPSSPRK family.

While the scheme using (26)-(27) is the motivation for the general formulation of our main result in the upcoming section, the scheme (26),(28) will be the basis for the bootstrapping algorithm to obtain higher order, see Section 3.2.2.

Main Result for Entropy Conservation and Positivity Preservation

We assume that the method can be written in the form

𝐮n+γ=𝐮n+1+(γ1)Δt𝐝γn(Δt)+𝒪(Δtp+1)\mathbf{u}^{n+\gamma}=\mathbf{u}^{n+1}+(\gamma-1)\Delta t\mathbf{d}^{n}_{\gamma}(\Delta t)+\mathcal{O}(\Delta t^{p+1})

with p2p\geq 2 being the order of the baseline method and some suitable 𝐝γn(Δt)\mathbf{d}^{n}_{\gamma}(\Delta t) depending on the method. Now, since 𝐝\mathbf{d} generally also depends on γ\gamma, we need the preparatory

Lemma 2.

Let h:U×Vh\colon U\times V\to\mathbb{R},

h(λ,γ)λ(γ1)Δt𝐝γn(Δt)2h(\lambda,\gamma)\coloneqq\lambda-(\gamma-1)\Delta t\lVert\mathbf{d}_{\gamma}^{n}(\Delta t)\rVert_{2} (36)

where U×VU\times V is an open neighborhood of (0,1)(0,1).

If 𝐝γn(Δt)\mathbf{d}_{\gamma}^{n}(\Delta t) is a 𝒞1\mathcal{C}^{1} map on VV w.r.t. γ\gamma with 𝐝1n(Δt)20\lVert\mathbf{d}_{1}^{n}(\Delta t)\rVert_{2}\neq 0 for Δt\Delta t small enough, then there exist a neighborhood U~\widetilde{U} of 0 and a continuous function γ~:U~\widetilde{\gamma}\colon\widetilde{U}\to\mathbb{R} such that h(λ,γ~(λ))=0h(\lambda,\widetilde{\gamma}(\lambda))=0 for all λU~\lambda\in\widetilde{U} and Δt>0\Delta t>0 small enough.

Proof.

We have h(0,1)=0h(0,1)=0 and

γh(λ,γ)=Δt𝐝γn(Δt)2(γ1)Δt(γ𝐝γn(Δt))T𝐝γn(Δt)𝐝γn(Δt)2.\partial_{\gamma}h(\lambda,\gamma)=-\Delta t\lVert\mathbf{d}_{\gamma}^{n}(\Delta t)\rVert_{2}-(\gamma-1)\Delta t\frac{(\partial_{\gamma}\mathbf{d}_{\gamma}^{n}(\Delta t))^{T}\mathbf{d}_{\gamma}^{n}(\Delta t)}{\lVert\mathbf{d}_{\gamma}^{n}(\Delta t)\rVert_{2}}.

Thus, γh(0,1)=Δt𝐝1n(Δt)20\partial_{\gamma}h(0,1)=-\Delta t\lVert\mathbf{d}_{1}^{n}(\Delta t)\rVert_{2}\neq 0 for Δt\Delta t small enough. The assertion then follows from the implicit function theorem. ∎

With that, we are positioned to prove the main theorem for the relaxation technique.

Theorem 2.

In the situation of Lemma 2, let

𝐮n+γ=𝐮n+1+(γ1)Δt𝐝γn(Δt)+𝒪(Δtp+1)\mathbf{u}^{n+\gamma}=\mathbf{u}^{n+1}+(\gamma-1)\Delta t\mathbf{d}^{n}_{\gamma}(\Delta t)+\mathcal{O}(\Delta t^{p+1})

with p2p\geq 2 being the order of the baseline method and suppose η𝒞2\eta\in\mathcal{C}^{2} with

η(𝐮n+1)𝐝γ~(Δtμ)n(Δt)𝐝γ~(Δtμ)n(Δt)2=c(𝐮n,μ)Δt+𝒪(Δt2)\eta^{\prime}(\mathbf{u}^{n+1})\cdot\frac{\mathbf{d}^{n}_{\widetilde{\gamma}(\Delta t\mu)}(\Delta t)}{\lVert\mathbf{d}^{n}_{\widetilde{\gamma}(\Delta t\mu)}(\Delta t)\rVert_{2}}=c(\mathbf{u}^{n},\mu)\Delta t+\mathcal{O}(\Delta t^{2})

for |μ|\mathinner{\!\left\lvert\mu\right\rvert} small enough and

limμ0c(𝐮n,μ)c~(𝐮n)0.\lim_{\mu\to 0}c(\mathbf{u}^{n},\mu)\eqqcolon\widetilde{c}(\mathbf{u}^{n})\neq 0.

Then the equation

η(𝐮n+γ)ηold=0\eta(\mathbf{u}^{n+\gamma})-\eta_{\text{old}}=0

possesses a positive solution γ\gamma. Furthermore, if 𝐝γn(Δt)2=𝒪(Δtq)\lVert\mathbf{d}_{\gamma}^{n}(\Delta t)\rVert_{2}=\mathcal{O}(\Delta t^{q}), then there exists a unique positive solution satisfying γ=1+𝒪(Δtp1q)\gamma=1+\mathcal{O}(\Delta t^{p-1-q}).

Proof.

We set

𝐰γn(Δt)𝐝γn(Δt)𝐝γn(Δt)2\mathbf{w}_{\gamma}^{n}(\Delta t)\coloneqq\frac{\mathbf{d}_{\gamma}^{n}(\Delta t)}{\lVert\mathbf{d}_{\gamma}^{n}(\Delta t)\rVert_{2}}

and follow the proof of [CLM2010, Theorem 2] by analyzing the function

z(Δt,μ)Δt2(η(𝐮n+γ~(Δtμ))ηold),Δt0.z(\Delta t,\mu)\coloneqq\Delta t^{-2}\left(\eta\left(\mathbf{u}^{n+\widetilde{\gamma}(\Delta t\mu)}\right)-\eta_{\text{old}}\right),\quad\Delta t\neq 0. (37)

The idea is to deduce that

z(Δt,μ)=μc(𝐮n,μ)+μ22(𝐰γ~(Δtμ)n(Δt))THη(𝐮n+1)𝐰γ~(Δtμ)n(Δt)+𝒪(Δt),z(\Delta t,\mu)=\mu c(\mathbf{u}^{n},\mu)+\frac{\mu^{2}}{2}(\mathbf{w}^{n}_{\widetilde{\gamma}(\Delta t\mu)}(\Delta t))^{T}H_{\eta}(\mathbf{u}^{n+1})\mathbf{w}^{n}_{\widetilde{\gamma}(\Delta t\mu)}(\Delta t)+\mathcal{O}(\Delta t), (38)

where HηH_{\eta} denotes the Hessian. Then, since γ~(0)=1\widetilde{\gamma}(0)=1, we have

z(0,μ)limΔt0z(Δt,μ)=μc~(𝐮n)+μ22(𝐰1n(0))THη(𝐮n)𝐰1n(0)andμz(0,0)=c~(𝐮n)0.z(0,\mu)\coloneqq\lim_{\Delta t\to 0}z(\Delta t,\mu)=\mu\widetilde{c}(\mathbf{u}^{n})+\frac{\mu^{2}}{2}(\mathbf{w}^{n}_{1}(0))^{T}H_{\eta}(\mathbf{u}^{n})\mathbf{w}^{n}_{1}(0)\quad\text{and}\quad\partial_{\mu}z(0,0)=\widetilde{c}(\mathbf{u}^{n})\neq 0.

According to the proof of [CLM2010, Theorem 2], there exist Δt>0\Delta t^{*}>0 and a unique function μ:[0,Δt]\mu\mathrel{\mathop{\ordinarycolon}}[0,\Delta t^{*}]\to\mathbb{R} such that z(Δt,μ(Δt))=0z(\Delta t,\mu(\Delta t))=0 for all 0ΔtΔt0\leq\Delta t\leq\Delta t^{*}. Indeed, because of

z(Δt,0)=Δt2(η(𝐮n+1)ηold)z(\Delta t,0)=\Delta t^{-2}\left(\eta(\mathbf{u}^{n+1})-\eta_{\text{old}}\right)

it can be deduced along the same lines that μ=𝒪(Δtp1)\mu=\mathcal{O}(\Delta t^{p-1}).

To prove (38) we first note that

𝐮n+γ=𝐮n+1+λ𝐰γn(Δt)+𝒪(Δtp+1),\mathbf{u}^{n+\gamma}=\mathbf{u}^{n+1}+\lambda\mathbf{w}_{\gamma}^{n}(\Delta t)+\mathcal{O}(\Delta t^{p+1}), (39)

where

λ(γ1)Δt𝐝γn(Δt)2.\lambda\coloneqq(\gamma-1)\Delta t\lVert\mathbf{d}_{\gamma}^{n}(\Delta t)\rVert_{2}. (40)

Next, as h(λ,γ)=0h(\lambda,\gamma)=0 we can use Lemma 2 to solve (40) for γ\gamma and plug it into (39) resulting in a function of λ\lambda only, i.e.

𝐮n+γ~(λ)=𝐮n+1+λ𝐰γ~(λ)n(Δt)+𝒪(Δtp+1).\mathbf{u}^{n+\widetilde{\gamma}(\lambda)}=\mathbf{u}^{n+1}+\lambda\mathbf{w}_{\widetilde{\gamma}(\lambda)}^{n}(\Delta t)+\mathcal{O}(\Delta t^{p+1}).

In the following, we denote by 𝐮(t)\mathbf{u}(t) the exact local solution at tt satisfying 𝐮(tn)=𝐮n\mathbf{u}(t_{n})=\mathbf{u}^{n} and recall that we are considering an entropy-conservative problem. Hence, with λΔtμ\lambda\eqqcolon\Delta t\mu and the assumptions on η\eta^{\prime} we receive

z(Δt,μ)\displaystyle z(\Delta t,\mu) =Δt2(η(𝐮n+γ~(Δtμ))η(𝐮(tn+Δt))))\displaystyle=\Delta t^{-2}\left(\eta(\mathbf{u}^{n+\widetilde{\gamma}(\Delta t\mu)})-\eta(\mathbf{u}(t_{n}+\Delta t)))\right)
=Δt2(η(𝐮n+1+Δtμ𝐰γ~(Δtμ)n(Δt))η(𝐮(tn+Δt))))+𝒪(Δtp1)\displaystyle=\Delta t^{-2}\left(\eta(\mathbf{u}^{n+1}+\Delta t\mu\mathbf{w}_{\widetilde{\gamma}(\Delta t\mu)}^{n}(\Delta t))-\eta(\mathbf{u}(t_{n}+\Delta t)))\right)+\mathcal{O}(\Delta t^{p-1})
=η(𝐮n+1)η(𝐮(tn+Δt))+Δtμη(𝐮n+1)𝐰γ~(Δtμ)n(Δt)Δt2\displaystyle=\frac{\eta(\mathbf{u}^{n+1})-\eta(\mathbf{u}(t_{n}+\Delta t))+\Delta t\mu\eta^{\prime}(\mathbf{u}^{n+1})\mathbf{w}_{\widetilde{\gamma}(\Delta t\mu)}^{n}(\Delta t)}{\Delta t^{2}}
+μ22(𝐰γ~(Δtμ)n(Δt))THη(𝐮n+1)𝐰γ~(Δtμ)n(Δt)+𝒪(Δt)\displaystyle\hphantom{==}+\frac{\mu^{2}}{2}(\mathbf{w}^{n}_{\widetilde{\gamma}(\Delta t\mu)}(\Delta t))^{T}H_{\eta}(\mathbf{u}^{n+1})\mathbf{w}^{n}_{\widetilde{\gamma}(\Delta t\mu)}(\Delta t)+\mathcal{O}(\Delta t)
=μc(𝐮n,μ)+μ22(𝐰γ~(Δtμ)n(Δt))THη(𝐮n+1)𝐰γ~(Δtμ)n(Δt)+𝒪(Δt).\displaystyle=\mu c(\mathbf{u}^{n},\mu)+\frac{\mu^{2}}{2}(\mathbf{w}^{n}_{\widetilde{\gamma}(\Delta t\mu)}(\Delta t))^{T}H_{\eta}(\mathbf{u}^{n+1})\mathbf{w}^{n}_{\widetilde{\gamma}(\Delta t\mu)}(\Delta t)+\mathcal{O}(\Delta t).

Finally, since 𝒪(Δtp1)=μ=λΔt=(γ1)𝐝γn(Δt)2\mathcal{O}(\Delta t^{p-1})=\mu=\frac{\lambda}{\Delta t}=(\gamma-1)\lVert\mathbf{d}_{\gamma}^{n}(\Delta t)\rVert_{2} we deduce that γ=1+𝒪(Δtp1q)\gamma=1+\mathcal{O}(\Delta t^{p-1-q}). ∎

Now if we are given such a solution γ=1+𝒪(Δtp1)\gamma=1+\mathcal{O}(\Delta t^{p-1}) we can deduce that the relaxation update is of order pp as the following lemma shows.

Lemma 3.

If 𝐮n+γ=𝐮n+1+(γ1)(𝐮n+1𝐮n)+𝒪(Δtp+1)\mathbf{u}^{n+\gamma}=\mathbf{u}^{n+1}+(\gamma-1)(\mathbf{u}^{n+1}-\mathbf{u}^{n})+\mathcal{O}(\Delta t^{p+1}) with a pp-th order baseline method and γ=1+𝒪(Δtp1)\gamma=1+\mathcal{O}(\Delta t^{p-1}), then

𝐮n+γ=𝐮(tn+γΔt)+𝒪(Δtp+1).\mathbf{u}^{n+\gamma}=\mathbf{u}(t_{n}+\gamma\Delta t)+\mathcal{O}(\Delta t^{p+1}).
Proof.

This is just Lemma [ranocha2020general, Lemma 2.7], where the relaxation method is perturbed by an additive error of 𝒪(Δtp+1)\mathcal{O}(\Delta t^{p+1}). ∎

Bootstrapping Algorithm for Positivity-Preserving Relaxation

The main idea for generalizing the relaxation technique to higher order is to use the observation from Remark 5, where

𝐮n+γ=𝐮n+1+(γ1)Δt𝐮n+1𝐮nΔt𝐝n+𝒪(Δtp+1).\mathbf{u}^{n+\gamma}=\mathbf{u}^{n+1}+(\gamma-1)\Delta t\underbrace{\frac{\mathbf{u}^{n+1}-\mathbf{u}^{n}}{\Delta t}}_{\eqqcolon\mathbf{d}^{n}}+\mathcal{O}(\Delta t^{p+1}).

Since 𝐝n\mathbf{d}^{n} now is independent of γ\gamma, there is no need for Lemma 2 as we have an explicit expression for γ~\widetilde{\gamma}, i.e. γ~(λ)=λ+Δt𝐝n2Δt𝐝n2\widetilde{\gamma}(\lambda)=\frac{\lambda+\Delta t\lVert\mathbf{d}^{n}\rVert_{2}}{\Delta t\lVert\mathbf{d}^{n}\rVert_{2}} and we can apply Theorem 2 giving us γ=1+𝒪(Δtp1)\gamma=1+\mathcal{O}(\Delta t^{p-1}). Also, Remark 4 motivates us to start a bootstrapping algorithm using the functions b¯j(γ)=γbj\bar{b}_{j}(\gamma)=\gamma b_{j} also for higher order. This seems to contradict our results from the theory on dense output, but in combination with relaxation, the issue can be resolved, see Remark 6 below.

Now in view of the following lemma, we can bootstrap the relaxation technique to higher orders.

Lemma 4.

Consider a scheme of the form (26) with b¯j(γ)=γbj\bar{b}_{j}(\gamma)=\gamma b_{j} and

𝝈¯(γ)=𝐮(tn+Δt)+(γ1)(𝐮(tn+Δt)𝐮(tn))+𝒪(Δtp)\displaystyle\bar{\bm{\sigma}}(\gamma)=\mathbf{u}(t_{n}+\Delta t)+(\gamma-1)(\mathbf{u}(t_{n}+\Delta t)-\mathbf{u}(t_{n}))+\mathcal{O}(\Delta t^{p}) (41)

for all γ\gamma in a neighborhood VV of 11. Then

uνn+γσ¯ν(γ)=1+𝒪(Δtp)=uνn+1σν+𝒪(Δtp),\frac{u_{\nu}^{n+\gamma}}{\bar{\sigma}_{\nu}(\gamma)}=1+\mathcal{O}(\Delta t^{p})=\frac{u_{\nu}^{n+1}}{\sigma_{\nu}}+\mathcal{O}(\Delta t^{p}),

and in particular,

𝐮n+γ=𝐮n+1+(γ1)(𝐮n+1𝐮n)+𝒪(Δtp+1)\mathbf{u}^{n+\gamma}=\mathbf{u}^{n+1}+(\gamma-1)(\mathbf{u}^{n+1}-\mathbf{u}^{n})+\mathcal{O}(\Delta t^{p+1})

for all γV\gamma\in V.

Remark 6.

Before we prove this lemma, we want to stress two things. First, using (41) with b¯j(γ)=γbj\bar{b}_{j}(\gamma)=\gamma b_{j} does not result in a higher-order dense output formula, but only guarantees to obtain the desired order at the root γ\gamma of η(𝐮n+γ)=ηold\eta(\mathbf{u}^{n+\gamma})=\eta_{\text{old}}, see Theorem 2 and Lemma 3.

Secondly, the bootstrapping process consists of using 𝐮n+γ\mathbf{u}^{n+\gamma} from the scheme of order p1p-1 as the new 𝛔¯(γ)\bar{\bm{\sigma}}(\gamma) resulting in a new method of order pp. We can start the bootstrapping process by using the second-order MPRK22(α\alpha) scheme as a baseline method together with (28). Note that this naturally results in nested functions that depend on γ\gamma, which should be kept in mind when implementing the Newton iteration.

Proof of Lemma 4.

We prove this claim by induction over q=1,,pq=1,\dotsc,p and exploit [IzginThesis, Lemma 4.6,Lemma 4.8] to justify the implication

𝐮n+γ=𝝈¯(γ)+𝒪(Δtq)uνn+γσ¯ν(γ)=1+𝒪(Δtq),ν=1,,N.\mathbf{u}^{n+\gamma}=\bar{\bm{\sigma}}(\gamma)+\mathcal{O}(\Delta t^{q})\Longrightarrow\frac{u_{\nu}^{n+\gamma}}{\bar{\sigma}_{\nu}(\gamma)}=1+\mathcal{O}(\Delta t^{q}),\quad\nu=1,\dotsc,N.

For q=1q=1 we find 𝐮n+γ=𝐮n+𝒪(Δt)\mathbf{u}^{n+\gamma}=\mathbf{u}^{n}+\mathcal{O}(\Delta t) and 𝝈¯(γ)=𝐮n+𝒪(Δt)\bar{\bm{\sigma}}(\gamma)=\mathbf{u}^{n}+\mathcal{O}(\Delta t) since uνn+γσ¯ν(γ)=𝒪(1)\frac{u_{\nu}^{n+\gamma}}{\bar{\sigma}_{\nu}(\gamma)}=\mathcal{O}(1) due to [IzginThesis, Lemma 4.6].

Now suppose that (41) holds with 𝒪(Δtq)\mathcal{O}(\Delta t^{q}) and some q2q\geq 2. By the induction hypothesis we have

uνn+γσ¯ν(γ)=1+𝒪(Δtq1)=uνn+1σν+𝒪(Δtq1)\frac{u_{\nu}^{n+\gamma}}{\bar{\sigma}_{\nu}(\gamma)}=1+\mathcal{O}(\Delta t^{q-1})=\frac{u_{\nu}^{n+1}}{\sigma_{\nu}}+\mathcal{O}(\Delta t^{q-1})

where we used uνn+1σν=1+𝒪(Δtp)\frac{u_{\nu}^{n+1}}{\sigma_{\nu}}=1+\mathcal{O}(\Delta t^{p}) and pqp\geq q for the last equality, see e.g. [NSARK, Lemma 16]. Substituting this into (26), we see

𝐮n+γ=𝐮n+1+(γ1)(𝐮n+1𝐮n)+𝒪(Δtq).\mathbf{u}^{n+\gamma}=\mathbf{u}^{n+1}+(\gamma-1)(\mathbf{u}^{n+1}-\mathbf{u}^{n})+\mathcal{O}(\Delta t^{q}).

Finally, since (41) holds with 𝒪(Δtq)\mathcal{O}(\Delta t^{q}), we deduce

uνn+γσ¯ν(γ)=1+𝒪(Δtq)=uνn+1σν+𝒪(Δtq).\frac{u_{\nu}^{n+\gamma}}{\bar{\sigma}_{\nu}(\gamma)}=1+\mathcal{O}(\Delta t^{q})=\frac{u_{\nu}^{n+1}}{\sigma_{\nu}}+\mathcal{O}(\Delta t^{q}).

Example 6 (Third-Order Relaxation for Conservative Problems using MPRK).

Looking at the third-order MPRK family from Example 2, the relaxation scheme is fully defined by setting

σ¯k(γ)=\displaystyle\bar{\sigma}_{k}(\gamma)= ukn+γΔtj=12βj(rkP(𝐮(j))+ν=1dpkν(𝐮(j))σ¯ν(γ)(uν(2))γa21(uνn)1γa21\displaystyle u_{k}^{n}+\gamma\Delta t\sum_{j=1}^{2}\beta_{j}\left(r_{k}^{P}(\mathbf{u}^{(j)})+\sum_{\nu=1}^{d}p_{k\nu}(\mathbf{u}^{(j)})\frac{\bar{\sigma}_{\nu}(\gamma)}{(u_{\nu}^{(2)})^{\frac{\gamma}{a_{21}}}(u_{\nu}^{n})^{1-\frac{\gamma}{a_{21}}}}\right. (42)
(rkD(𝐮(j))+ν=1ddkν(𝐮(j)))σ¯k(γ)(uk(2))γa21(ukn)1γa21)\displaystyle\hskip 56.9055pt-\left.\left(r_{k}^{D}(\mathbf{u}^{(j)})+\sum_{\nu=1}^{d}d_{k\nu}(\mathbf{u}^{(j)})\right)\frac{\bar{\sigma}_{k}(\gamma)}{(u_{k}^{(2)})^{\frac{\gamma}{a_{21}}}(u_{k}^{n})^{1-\frac{\gamma}{a_{21}}}}\right)

for k=1,,dk=1,\dotsc,d, β1=1β2\beta_{1}=1-\beta_{2}, and β2=12a21\beta_{2}=\frac{1}{2a_{21}}.

Applying Newton’s Method

As an illustrative example, we focus on (26) with b¯j(γ)=γbj\bar{b}_{j}(\gamma)=\gamma b_{j}, i.e.

ukn+γ=ukn+Δtγj=1sbj(rkP(𝐮(j))+ν=1dpkν(𝐮(j))uνn+γσ¯ν(γ)(rkD(𝐮(j))+ν=1ddkν(𝐮(j)))ukn+γσ¯k(γ)),\begin{aligned} u^{n+\gamma}_{k}=u_{k}^{n}&+\Delta t\gamma\sum_{j=1}^{s}b_{j}\left(r_{k}^{P}(\mathbf{u}^{(j)})+\sum_{\nu=1}^{d}p_{k\nu}(\mathbf{u}^{(j)})\frac{u_{\nu}^{n+\gamma}}{\bar{\sigma}_{\nu}(\gamma)}-\left(r^{D}_{k}(\mathbf{u}^{(j)})+\sum_{\nu=1}^{d}d_{k\nu}(\mathbf{u}^{(j)})\right)\frac{u_{k}^{n+\gamma}}{\bar{\sigma}_{k}(\gamma)}\right)\\ \end{aligned}, (43)

and that

(𝐌γ)kk\displaystyle(\mathbf{M}_{\gamma})_{kk} =1+γΔtj=1sbj(rkD(𝐮(j))+ν=1ddkν(𝐮(j)))1σ¯k(γ),\displaystyle=1+\gamma\Delta t\sum_{j=1}^{s}b_{j}\left(r_{k}^{D}(\mathbf{u}^{(j)})+\sum_{\nu=1}^{d}d_{k\nu}(\mathbf{u}^{(j)})\right)\frac{1}{\bar{\sigma}_{k}(\gamma)}, (44)
(𝐌γ)kν\displaystyle(\mathbf{M}_{\gamma})_{k\nu} =γΔtj=1sbjpkν(𝐮(j))1σ¯ν(γ),kν.\displaystyle=-\gamma\Delta t\sum_{j=1}^{s}b_{j}p_{k\nu}(\mathbf{u}^{(j)})\frac{1}{\bar{\sigma}_{\nu}(\gamma)},\quad k\neq\nu.

in (31).

Starting with (43) we deduce

ddγukn+γ=Δtj=1sb¯j(γ)\displaystyle\frac{\mathrm{d}}{\mathrm{d}\gamma}u^{n+\gamma}_{k}=\Delta t\sum_{j=1}^{s}\bar{b}_{j}^{\prime}(\gamma) (rkP(𝐮(j))+ν=1dpkν(𝐮(j))uνn+γσ¯ν(γ)(rkD(𝐮(j))+ν=1ddkν(𝐮(j)))ukn+γσ¯k(γ))\displaystyle\left(r_{k}^{P}(\mathbf{u}^{(j)})+\sum_{\nu=1}^{d}p_{k\nu}(\mathbf{u}^{(j)})\frac{u_{\nu}^{n+\gamma}}{\bar{\sigma}_{\nu}(\gamma)}-\left(r^{D}_{k}(\mathbf{u}^{(j)})+\sum_{\nu=1}^{d}d_{k\nu}(\mathbf{u}^{(j)})\right)\frac{u_{k}^{n+\gamma}}{\bar{\sigma}_{k}(\gamma)}\right)
+Δtj=1sb¯j(γ\displaystyle+\Delta t\sum_{j=1}^{s}\bar{b}_{j}(\gamma )(ν=1dpkν(𝐮(j))(ddγuνn+γσ¯ν(γ)uνn+γσ¯ν(γ)(σ¯ν(γ))2)\displaystyle)\left(\sum_{\nu=1}^{d}p_{k\nu}(\mathbf{u}^{(j)})\left(\frac{\frac{\mathrm{d}}{\mathrm{d}\gamma}u_{\nu}^{n+\gamma}}{\bar{\sigma}_{\nu}(\gamma)}-\frac{u_{\nu}^{n+\gamma}\bar{\sigma}_{\nu}^{\prime}(\gamma)}{(\bar{\sigma}_{\nu}(\gamma))^{2}}\right)\right.
(rkD(𝐮(j))+ν=1ddkν(𝐮(j)))(ddγukn+γσ¯k(γ)ukn+γσ¯k(γ)(σ¯k(γ))2)).\displaystyle\left.-\left(r^{D}_{k}(\mathbf{u}^{(j)})+\sum_{\nu=1}^{d}d_{k\nu}(\mathbf{u}^{(j)})\right)\left(\frac{\frac{\mathrm{d}}{\mathrm{d}\gamma}u_{k}^{n+\gamma}}{\bar{\sigma}_{k}(\gamma)}-\frac{u_{k}^{n+\gamma}\bar{\sigma}_{k}^{\prime}(\gamma)}{(\bar{\sigma}_{k}(\gamma))^{2}}\right)\right).

To rewrite this in matrix-vector notation, we denote by \oslash and \odot the component-wise division and multiplication (Hadamard division and product), respectively. Then, using

𝐯(γ)𝐮n+γ𝝈¯(γ)(𝝈¯(γ)),\mathbf{v}(\gamma)\coloneqq\mathbf{u}^{n+\gamma}\odot\bar{\bm{\sigma}}^{\prime}(\gamma)\oslash(\bar{\bm{\sigma}}(\gamma)),

we end up with

𝐌γ(𝐮n)ddγ𝐮n+γ=\displaystyle\mathbf{M}_{\gamma}(\mathbf{u}^{n})\frac{\mathrm{d}}{\mathrm{d}\gamma}\mathbf{u}^{n+\gamma}= 1γ(𝐮n+γ𝐮n)+(𝐌γ(𝐮n)𝐈)𝐯(γ).\displaystyle\frac{1}{\gamma}(\mathbf{u}^{n+\gamma}-\mathbf{u}^{n})+(\mathbf{M}_{\gamma}(\mathbf{u}^{n})-\mathbf{I})\mathbf{v}(\gamma). (45)

Note that if 𝝈¯(γ)=𝝈\bar{\bm{\sigma}}(\gamma)=\bm{\sigma}, then 𝐯(γ)=𝟎\mathbf{v}(\gamma)=\mathbf{0}. Also note that the derivative of 𝝈¯\bar{\bm{\sigma}} from (42) itself satisfies an analogue equation to (45) as it represents an MPRK22(α\alpha) relaxation method of order 22.

Also, the system for MPSSPRK22 is the same; one only has to plug in the expressions 𝐌γ\mathbf{M}_{\gamma} and 𝝈¯\bar{\bm{\sigma}} for that particular scheme.

4 Numerical Experiments

In this section we apply our new relaxation algorithm to dissipative and conservative problems to validate our theoretical findings and to experimentally test the constraints on Δt\Delta t for solving the system (31). We note that we use Newton’s method for the computation of γ\gamma, if not stated otherwise. Also, we use (27) as default for MPRK22(α(\alpha). Also, we may use a PID controller with parameters β1=0.7\beta_{1}=0.7, β2=0.4\beta_{2}=0.4, and β3=0\beta_{3}=0, see [IR2023] for more details. The resulting method is denoted by MPRK22adap. We note that our implementation of the relaxation algorithm, which can be found in our reproducibility repository [IRS2026repository], is also adaptive in the sense that successful relaxation steps increase the Δt\Delta t by 1% while unsuccessful searches for γ\gamma result in a 10% decrease of the time step size. We refer to the repository [IRS2026repository] for the implemented abortion criteria.

4.1 Lotka-Volterra System

The classical Lotka-Volterra system

u1(t)\displaystyle u_{1}^{\prime}(t) =2u1(t)u1(t)u2(t),d\displaystyle=2u_{1}(t)-u_{1}(t)u_{2}(t),d
u2(t)\displaystyle u_{2}^{\prime}(t) =u1(t)u2(t)u2(t),𝐮(0)=(2,2)T\displaystyle=u_{1}(t)u_{2}(t)-u_{2}(t),\quad\mathbf{u}(0)=(2,2)^{T}

can be written as a non-conservative PDRS with

r1P=2u1,p21=u1u2=d12,r2D=u2.r_{1}^{P}=2u_{1},\quad p_{21}=u_{1}u_{2}=d_{12},\quad r_{2}^{D}=u_{2}.

The entropy

η(𝐮)=log(u1)u1+2log(u2)u2\eta(\mathbf{u})=\log(u_{1})-u_{1}+2\log(u_{2})-u_{2}

is conserved. Since the Lotka-Volterra system has periodic orbits, we expect improved numerical results using relaxation to conserve the entropy [cano1997error, cano1998error, calvo2011error].

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 1: Numerical solution of Lotka Volterra problem using MPRK22(1) (top) and Δt=1\Delta t=1. The error is errn=max(|u1nu1ref,n|,|u2nu2ref,n|)\text{err}^{n}=\max(\lvert u_{1}^{n}-u_{1}^{\text{ref},n}\rvert,\lvert u_{2}^{n}-u_{2}^{\text{ref},n}\rvert). Left: without relaxation. Right: with relaxation.

Although there are only positivity constraints, η\eta is not non-decreasing for all 𝐮>𝟎\mathbf{u}>\mathbf{0} in this example, which is why we use the default relaxation algorithm. As expected, we observe that relaxation improves the error growth of the base method from quadratic to linear, see Figure 1.

4.2 Stratospheric Reaction Problem

The stratospheric reaction problem [sandu2001positive] is a stiff system of ODEs 𝐰(t)=𝐟(t,𝐰(t))\mathbf{w}^{\prime}(t)=\mathbf{f}(t,\mathbf{w}(t)) describing the interaction of the constituents 𝐰=(w1,,w6)=(O1D,O,O3,O2,NO,NO2)\mathbf{w}=(w_{1},\dotsc,w_{6})=(O^{1D},O,O_{3},O_{2},NO,NO_{2}). This non-conservative PDS possesses two linear invariants determined by the vectors 𝐧~1=(1,1,3,2,1,2)T\widetilde{\mathbf{n}}_{1}=(1,1,3,2,1,2)^{T} and 𝐧~2=(0,0,0,0,1,1)T\widetilde{\mathbf{n}}_{2}=(0,0,0,0,1,1)^{T}. In order to apply MPRK schemes to this problem, we scale the corresponding differential equations writing

diag(𝐧~1)𝐰(t)=diag(𝐧1~)𝐟(t,diag(𝐧~1)1diag(𝐧~1)𝐰(t)).\operatorname{diag}(\widetilde{\mathbf{n}}_{1})\mathbf{w}^{\prime}(t)=\operatorname{diag}(\widetilde{\mathbf{n}_{1}})\mathbf{f}(t,\operatorname{diag}(\widetilde{\mathbf{n}}_{1})^{-1}\operatorname{diag}(\widetilde{\mathbf{n}}_{1})\mathbf{w}(t)).

Hence, introducing 𝐮(t)=diag(𝐧~1)𝐰(t)\mathbf{u}(t)=\operatorname{diag}(\widetilde{\mathbf{n}}_{1})\mathbf{w}(t), the two linear invariants of the differential equations 𝐮(t)=𝐟(t,diag(𝐧~1)1𝐮(t))\mathbf{u}^{\prime}(t)=\mathbf{f}(t,\operatorname{diag}(\widetilde{\mathbf{n}}_{1})^{-1}\mathbf{u}(t)) are 𝐧1=(1,1,1,1,1,1)T\mathbf{n}_{1}=(1,1,1,1,1,1)^{T} and 𝐧2=(0,0,0,0,1,12)T\mathbf{n}_{2}=(0,0,0,0,1,\tfrac{1}{2})^{T}. Moreover, the scaled system takes the form

u1\displaystyle u_{1}^{\prime} =13r5(t,𝐮)(r6(𝐮)+13r7(𝐮)),\displaystyle=\tfrac{1}{3}r_{5}(t,\mathbf{u})-(r_{6}(\mathbf{u})+\tfrac{1}{3}r_{7}(\mathbf{u})), (46)
u2\displaystyle u_{2}^{\prime} =r1(t,𝐮)+13r3(t,𝐮)+r6(𝐮)+12r10(t,𝐮)(12r2(𝐮)+13r4(𝐮)+12r9(𝐮)+r11(𝐮)),\displaystyle=r_{1}(t,\mathbf{u})+\tfrac{1}{3}r_{3}(t,\mathbf{u})+r_{6}(\mathbf{u})+\tfrac{1}{2}r_{10}(t,\mathbf{u})-(\tfrac{1}{2}r_{2}(\mathbf{u})+\tfrac{1}{3}r_{4}(\mathbf{u})+\tfrac{1}{2}r_{9}(\mathbf{u})+r_{11}(\mathbf{u})),
u3\displaystyle u_{3}^{\prime} =32r2(𝐮)(r3(t,𝐮)+r4(𝐮)+r5(t,𝐮)+r7(𝐮)+r8(𝐮)),\displaystyle=\tfrac{3}{2}r_{2}(\mathbf{u})-(r_{3}(t,\mathbf{u})+r_{4}(\mathbf{u})+r_{5}(t,\mathbf{u})+r_{7}(\mathbf{u})+r_{8}(\mathbf{u})),
u4\displaystyle u_{4}^{\prime} =23r3(t,𝐮)+43r4(𝐮)+23r5(t,𝐮)+43r7(𝐮)+23r8(𝐮)+r9(𝐮)(r1(t,𝐮)+r2(𝐮)),\displaystyle=\tfrac{2}{3}r_{3}(t,\mathbf{u})+\tfrac{4}{3}r_{4}(\mathbf{u})+\tfrac{2}{3}r_{5}(t,\mathbf{u})+\tfrac{4}{3}r_{7}(\mathbf{u})+\tfrac{2}{3}r_{8}(\mathbf{u})+r_{9}(\mathbf{u})-(r_{1}(t,\mathbf{u})+r_{2}(\mathbf{u})),
u5\displaystyle u_{5}^{\prime} =12r9(𝐮)+12r10(t,𝐮)(13r8(𝐮)r11(𝐮)),\displaystyle=\tfrac{1}{2}r_{9}(\mathbf{u})+\tfrac{1}{2}r_{10}(t,\mathbf{u})-(\tfrac{1}{3}r_{8}(\mathbf{u})r_{11}(\mathbf{u})),
u6\displaystyle u_{6}^{\prime} =23r8(𝐮)+2r11(𝐮)(r9(𝐮)+r10(t,𝐮)),\displaystyle=\tfrac{2}{3}r_{8}(\mathbf{u})+2r_{11}(\mathbf{u})-(r_{9}(\mathbf{u})+r_{10}(t,\mathbf{u})),

where

r1(t,𝐮)\displaystyle r_{1}(t,\mathbf{u}) =k1(t)u4,\displaystyle=k_{1}(t)u_{4}, k1(t)\displaystyle k_{1}(t) =σ(T(t))32.6431010,\displaystyle=\sigma(T(t))^{3}\cdot 643\cdot 0^{-10},
r2(t,𝐮)\displaystyle r_{2}(t,\mathbf{u}) =k2u2u4,\displaystyle=k_{2}u_{2}u_{4}, k2\displaystyle k_{2} =8.0181017,\displaystyle=018\cdot 0^{-17},
r3(t,𝐮)\displaystyle r_{3}(t,\mathbf{u}) =k3(t)u3,\displaystyle=k_{3}(t)u_{3}, k3(t)\displaystyle k_{3}(t) =σ(T(t))6.120104,\displaystyle=\sigma(T(t))\cdot 120\cdot 0^{-4},
r4(t,𝐮)\displaystyle r_{4}(t,\mathbf{u}) =k4u2u3,\displaystyle=k_{4}u_{2}u_{3}, k4\displaystyle k_{4} =1.5761015,\displaystyle=576\cdot 0^{-15},
r5(t,𝐮)\displaystyle r_{5}(t,\mathbf{u}) =k5(t)u3,\displaystyle=k_{5}(t)u_{3}, k5(t)\displaystyle k_{5}(t) =σ(T(t))21.070103,\displaystyle=\sigma(T(t))^{2}\cdot 070\cdot 0^{-3},
r6(t,𝐮)\displaystyle r_{6}(t,\mathbf{u}) =k6Mu1,\displaystyle=k_{6}Mu_{1}, k6\displaystyle k_{6} =7.1101011,\displaystyle=110\cdot 0^{-11}, M\displaystyle M =8.1201016,\displaystyle=120\cdot 0^{16},
r7(t,𝐮)\displaystyle r_{7}(t,\mathbf{u}) =k7u1u3,\displaystyle=k_{7}u_{1}u_{3}, k7\displaystyle k_{7} =1.2001010,\displaystyle=200\cdot 0^{-10},
r8(t,𝐮)\displaystyle r_{8}(t,\mathbf{u}) =k8u3u5,\displaystyle=k_{8}u_{3}u_{5}, k8\displaystyle k_{8} =6.0621015,\displaystyle=062\cdot 0^{-15},
r9(t,𝐮)\displaystyle r_{9}(t,\mathbf{u}) =k9u2u6,\displaystyle=k_{9}u_{2}u_{6}, k9\displaystyle k_{9} =1.0691011,\displaystyle=069\cdot 0^{-11},
r10(t,𝐮)\displaystyle r_{10}(t,\mathbf{u}) =k10(t)u6,\displaystyle=k_{10}(t)u_{6}, k10(t)\displaystyle k_{10}(t) =σ(T(t))1.289102,\displaystyle=\sigma(T(t))\cdot 289\cdot 0^{-2},
r11(t,𝐮)\displaystyle r_{11}(t,\mathbf{u}) =k11u2u5,\displaystyle=k_{11}u_{2}u_{5}, k11\displaystyle k_{11} =108\displaystyle=0^{-8}

as well as

σ(T(t))\displaystyle\sigma(T(t)) ={0.5+0.5cos(π|2T(t)TrT2TsTr|2T(t)TrT2TsTr),TrT(t)Ts,0,otherwise,\displaystyle=
T(t)\displaystyle T(t) =t3600mod24,Tr=4.5,Ts=19.5.\displaystyle=\frac{t}{3600}\mod 4,\quad T_{r}=5,\quad T_{s}=95.

The non-zero production and destruction terms of the system (46) are given by

d12(t,𝐮)\displaystyle d_{12}(t,\mathbf{u}) =r6(𝐮),\displaystyle=r_{6}(\mathbf{u}), d14(t,𝐮)\displaystyle d_{14}(t,\mathbf{u}) =13r7(𝐮),\displaystyle=\tfrac{1}{3}r_{7}(\mathbf{u}), d23(t,𝐮)\displaystyle d_{23}(t,\mathbf{u}) =12r2(𝐮),\displaystyle=\tfrac{1}{2}r_{2}(\mathbf{u}),
d24(t,𝐮)\displaystyle d_{24}(t,\mathbf{u}) =13r4(𝐮),\displaystyle=\tfrac{1}{3}r_{4}(\mathbf{u}), d25(t,𝐮)\displaystyle d_{25}(t,\mathbf{u}) =12r9(𝐮),\displaystyle=\tfrac{1}{2}r_{9}(\mathbf{u}), d26(t,𝐮)\displaystyle d_{26}(t,\mathbf{u}) =r11(𝐮),\displaystyle=r_{11}(\mathbf{u}),
d31(t,𝐮)\displaystyle d_{31}(t,\mathbf{u}) =13r5(t,𝐮),\displaystyle=\tfrac{1}{3}r_{5}(t,\mathbf{u}), d32(t,𝐮)\displaystyle d_{32}(t,\mathbf{u}) =13r3(t,𝐮),\displaystyle=\tfrac{1}{3}r_{3}(t,\mathbf{u}), d36(t,𝐮)\displaystyle d_{36}(t,\mathbf{u}) =13r8(𝐮),\displaystyle=\tfrac{1}{3}r_{8}(\mathbf{u}),
d34(t,𝐮)\displaystyle d_{34}(t,\mathbf{u}) =23r3(t,𝐮)+r4(𝐮)+23r5(t,𝐮)+r7(𝐮)+23r8(𝐮),\displaystyle=\tfrac{2}{3}r_{3}(t,\mathbf{u})+r_{4}(\mathbf{u})+\tfrac{2}{3}r_{5}(t,\mathbf{u})+r_{7}(\mathbf{u})+\tfrac{2}{3}r_{8}(\mathbf{u}),\hskip-170.71652pt
d42(t,𝐮)\displaystyle d_{42}(t,\mathbf{u}) =r1(t,𝐮),\displaystyle=r_{1}(t,\mathbf{u}), d43(t,𝐮)\displaystyle d_{43}(t,\mathbf{u}) =r2(𝐮),\displaystyle=r_{2}(\mathbf{u}), d56(t,𝐮)\displaystyle d_{56}(t,\mathbf{u}) =r11(𝐮)+13r8(𝐮),\displaystyle=r_{11}(\mathbf{u})+\tfrac{1}{3}r_{8}(\mathbf{u}),
d62(t,𝐮)\displaystyle d_{62}(t,\mathbf{u}) =12r10(t,𝐮),\displaystyle=\tfrac{1}{2}r_{10}(t,\mathbf{u}), d64(t,𝐮)\displaystyle d_{64}(t,\mathbf{u}) =r9(𝐮),\displaystyle=r_{9}(\mathbf{u}), d65(t,𝐮)\displaystyle d_{65}(t,\mathbf{u}) =12r10(t,𝐮),\displaystyle=\tfrac{1}{2}r_{10}(t,\mathbf{u}),

and pij=djip_{ij}=d_{ji}. The solution to this problem will be approximated over the time interval [123600,843600][12\cdot 3600,84\cdot 3600], where a unit of time represents a second. A reference solution of the scaled problem is depicted in Figure 2.

Refer to caption
Figure 2: Reference solution of the scaled stratospheric problem (46) depicted over the time interval [12,84][12,84] in hours.

As we will see, MPRK schemes do not conserve the second linear invariant, which is why

𝐧2T𝐮n+1𝐮n𝐮n+1𝐮n=c(𝐮n)Δt+𝒪(Δt2)\mathbf{n}_{2}^{T}\frac{\mathbf{u}^{n+1}-\mathbf{u}^{n}}{\lVert\mathbf{u}^{n+1}-\mathbf{u}^{n}\rVert}=c(\mathbf{u}^{n})\Delta t+\mathcal{O}(\Delta t^{2})

with c0c\neq 0. Hence, we may use

η(𝐮)=𝐧2T𝐮\eta(\mathbf{u})=\mathbf{n}_{2}^{T}\mathbf{u}

as an entropy function satisfying (5) to preserve also the second linear invariant with our relaxation technique for conservative problems. As can be seen in Figure 3, using relaxation improves the accuracy of the solution significantly. However, we note that Newton’s method, while working in principle, sometimes fails at finding a solution γ1\gamma\approx 1 in our implementation. Indeed, we observed γ1012\gamma\approx 10^{-12} and thus decided to use Regula Falsi as a solver.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 3: Numerical solution of stratospheric reaction problem using standard MPRK22adap(1) (left) with rtol=atol=103\texttt{rtol}=\texttt{atol}=10^{-3} and Δt0=0.01h\Delta t_{0}=0.01h. Top: without relaxation. Middle: with relaxation (Regula Falsi) and initial Δt=0.01h\Delta t=0.01h. Bottom: Plots of Δt\Delta t and γ\gamma for the run with relaxation.

4.3 Linear advection

Consider the linear advection equation

tu+xu=0\partial_{t}u+\partial_{x}u=0 (47)

with periodic boundary conditions on I=[0,2]I=[0,2] and a positive initial condition u0>0u^{0}>0. Then, the solution u(t,x)=u0(xt)u(t,x)=u^{0}(x-t) stays positive. Moreover, every functional of the form

η(u(t,I))=IU(u(t,x))dx\eta(u(t,I))=\int_{I}U\bigl(u(t,x)\bigr)\operatorname{d\!}x (48)

for an entropy function UU is conserved with associated entropy flux F(u)=U(u)F(u)=U(u). Following Tadmor [Tadmor03], the entropy variables are w=U(u)w=U^{\prime}(u) and the flux potential is ψ=wfF=U(u)uU(u)\psi=wf-F=U^{\prime}(u)u-U(u). The corresponding entropy-conservative numerical flux is

fnum(u,u+)=ψ(u+)ψ(u)w(u+)w(u)=U(u+)u+U(u+)U(u)u+U(u)U(u+)U(u).f^{\mathrm{num}}(u_{-},u_{+})=\frac{\psi(u_{+})-\psi(u_{-})}{w(u_{+})-w(u_{-})}=\frac{U^{\prime}(u_{+})u_{+}-U(u_{+})-U^{\prime}(u_{-})u_{-}+U(u_{-})}{U^{\prime}(u_{+})-U^{\prime}(u_{-})}. (49)

If U(u)U^{\prime}(u)\to\infty faster than U(u)U(u) as u0u\to 0, then the numerical flux goes to zero if one of the states goes to zero. Therefore, the resulting finite volume method

tui+fnum(ui1,ui)fnum(ui,ui+1)Δx=0\partial_{t}u_{i}+\frac{f^{\mathrm{num}}(u_{i-1},u_{i})-f^{\mathrm{num}}(u_{i},u_{i+1})}{\Delta x}=0 (50)

is positivity-preserving in this case. In particular, the numerical fluxes fnum(u,u+)f^{\mathrm{num}}(u_{-},u_{+}) are always non-negative, resulting in a conservative production-destruction system. Next, we consider several examples.

  • The entropy

    U(u)=ulog(u)uU(u)=u\log(u)-u (51)

    leads to the entropy-conservative numerical flux

    fnum(u,u+)=u+ulog(u+)log(u)=:{{u}}logf^{\mathrm{num}}(u_{-},u_{+})=\frac{u_{+}-u_{-}}{\log(u_{+})-\log(u_{-})}=\mathrel{\mathop{\ordinarycolon}}\{\!\{u\}\!\}_{\mathrm{log}} (52)

    using the logarithmic mean, see [ranocha2021preventing, Section 3.2].

  • Similarly, the entropy

    U(u)=uU(u)=-\sqrt{u} (53)

    leads to the entropy variables w=1/(2u)w=-1/(2\sqrt{u}), the flux potential ψ=u/2\psi=\sqrt{u}/2, and the entropy-conservative numerical flux

    fnum(u,u+)=u+u1/u++1/u=uu+=:{{u}}geof^{\mathrm{num}}(u_{-},u_{+})=\frac{\sqrt{u_{+}}-\sqrt{u_{-}}}{-1/\sqrt{u_{+}}+1/\sqrt{u_{-}}}=\sqrt{u_{-}u_{+}}=\mathrel{\mathop{\ordinarycolon}}\{\!\{u\}\!\}_{\mathrm{geo}} (54)

    using the geometric mean.

  • Analogously, the entropy

    U(u)=1/uU(u)=1/u (55)

    leads to the entropy variables w=1/u2w=-1/u^{2}, the flux potential ψ=2/u\psi=-2/u, and the entropy-conservative numerical flux

    fnum(u,u+)=2/u++2/u1/u+2+1/u2=2uu+u++u=:{{u}}harmf^{\mathrm{num}}(u_{-},u_{+})=\frac{-2/u_{+}+2/u_{-}}{-1/u_{+}^{2}+1/u_{-}^{2}}=\frac{2u_{-}u_{+}}{u_{+}+u_{-}}=\mathrel{\mathop{\ordinarycolon}}\{\!\{u\}\!\}_{\mathrm{harm}} (56)

    using the harmonic mean.

Please note that positivity preservation for an entropy-conservative method depends on the choice of the entropy function. For example, the standard L2L^{2} entropy

U(u)=u22U(u)=\frac{u^{2}}{2} (57)

leads to the numerical flux

fnum(u,u+)=12u+2u2u+u=u+u+2,f^{\mathrm{num}}(u_{-},u_{+})=\frac{1}{2}\frac{u_{+}^{2}-u_{-}^{2}}{u_{+}-u_{-}}=\frac{u_{-}+u_{+}}{2}, (58)

i.e., the standard arithmetic mean. The resulting finite volume discretization

tui+ui+1ui12Δx=0\partial_{t}u_{i}+\frac{u_{i+1}-u_{i-1}}{2\Delta x}=0 (59)

is the classical second-order central discretization, which is not positivity-preserving.

We use Nx=100N_{x}=100 cells and the initial condition

u(0,x)=1.9sin(πx)+2,x[0,2]u(0,x)=1.9\sin(\pi x)+2,\quad x\in[0,2]

and apply different iterative methods for solving for γ\gamma. The respective results are depicted in Figure 4.

Refer to caption
(a) Logarithmic mean
Refer to caption
(b) Logarithmic mean, secant method
Refer to caption
(c) Geometric mean
Refer to caption
(d) Geometric mean, regula falsi scheme
Refer to caption
(e) Harmonic mean
Refer to caption
(f) Harmonic mean, bisection method
Figure 4: Numerical solution of linear advection equation using MPSSPRK(0.5,1) ((a)-(b)), MPRK43adap(0.5,0.75) ((c)-(d)), and MPRK22adap(1) ((e)-(f)) with rtol=atol=103\texttt{rtol}=\texttt{atol}=10^{-3}, N=100N=100, and Δt0=Δx\Delta t_{0}=\Delta x. Left: without relaxation. Right: with relaxation.

4.4 Shallow water equations

The classical shallow water equations

t(hhv)=u+x(hvhv2+12gh2)=f(u)=0\partial_{t}\underbrace{\begin{pmatrix}h\\ hv\end{pmatrix}}_{=u}+\partial_{x}\underbrace{\begin{pmatrix}hv\\ hv^{2}+\tfrac{1}{2}gh^{2}\end{pmatrix}}_{=f(u)}=0 (60)

have the total energy

U(u)=12hv2+12gh2U(u)=\tfrac{1}{2}hv^{2}+\tfrac{1}{2}gh^{2} (61)

as entropy. The corresponding entropy variables are

w=(gh12v2v)w=\begin{pmatrix}gh-\tfrac{1}{2}v^{2}\\ v\end{pmatrix} (62)

and the entropy flux potential is

ψ=12gh2v.\psi=\tfrac{1}{2}gh^{2}v. (63)

For constant velocity vv, the condition for an entropy-conservative numerical flux is

0=[[w]]fnum[[ψ]]=g[[h]]fhnum12g[[h2]]v,0=[\![w]\!]\cdot f^{\mathrm{num}}-[\![\psi]\!]=g[\![h]\!]f^{\mathrm{num}}_{h}-\tfrac{1}{2}g[\![h^{2}]\!]v, (64)

where [[w]]wi+1wi[\![w]\!]\coloneqq w_{i+1}-w_{i}. Thus, the numerical flux for the water height hh is

fhnum=12[[h2]][[h]]v={{h}}vf^{\mathrm{num}}_{h}=\tfrac{1}{2}\frac{[\![h^{2}]\!]}{[\![h]\!]}v=\{\!\{h\}\!\}v (65)

with the arithmetic mean

{{h}}=12(h+h+).\{\!\{h\}\!\}=\tfrac{1}{2}(h_{-}+h_{+}). (66)

Similarly to the linear advection equations above, the arithmetic mean does not lead to a positivity-preserving semidiscretization. This proves

Theorem 3.

An entropy-conservative semidiscretization of the shallow water equations is not positivity-preserving.

One can show a similar result for the polytropic Euler equations with pressure pϱγp\propto\varrho^{\gamma} and γ>1\gamma>1. However, the limiting case of the isothermal Euler equations is different and discussed in the next subsection.

4.5 Isothermal Euler equations

The 1D isothermal Euler equations are

t(ϱϱv)=𝐮+x(ϱvϱv2+c2ϱ)=𝐟(𝐮)=0,\partial_{t}\underbrace{\begin{pmatrix}\varrho\\ \varrho v\end{pmatrix}}_{=\mathbf{u}}+\partial_{x}\underbrace{\begin{pmatrix}\varrho v\\ \varrho v^{2}+c^{2}\varrho\end{pmatrix}}_{=\mathbf{f}(\mathbf{u})}=0, (67)

where ϱ\varrho is the density, vv is the velocity, and cc is the speed of sound. We take the total energy

U(𝐮)=12ϱv2+12c2ϱlog(ϱ)U(\mathbf{u})=\tfrac{1}{2}\varrho v^{2}+\tfrac{1}{2}c^{2}\varrho\log(\varrho) (68)

as (mathematical) entropy. An associated entropy-conservative numerical flux at the interface i+12i+\frac{1}{2} is given by [winters2020entropy]

fϱnum={{ϱ}}log{{v}},fϱvnum={{ϱ}}log{{v}}2+{{c2ϱ}}.f^{\mathrm{num}}_{\varrho}=\{\!\{\varrho\}\!\}_{\mathrm{log}}\{\!\{v\}\!\},\quad f^{\mathrm{num}}_{\varrho v}=\{\!\{\varrho\}\!\}_{\mathrm{log}}\{\!\{v\}\!\}^{2}+\{\!\{c^{2}\varrho\}\!\}. (69)

Since the logarithmic mean goes to zero if one of the states goes to zero, the resulting entropy-conservative finite volume method is unconditionally positive. Even more general, the flux differencing method [tadmor1987numerical, Tadmor03, lefloch2002fully, fisher2013discretely, ranocha2018comparison, chen2017entropy] based on diagonal-norm SBP operators. In particular, high-order discontinuous Galerkin spectral element methods (DGSEMs) are positivity-preserving. While we apply the underlying explicit RK method to the second conserved variable together with the standard relaxation algorithm, we use MPRK22 for ϱ\varrho, where we use the PDS

pi+1,i=di,i+1=max{0,fϱnum},pi,i+1=di+1,i=min{0,fϱnum}p_{i+1,i}=d_{i,i+1}=\max\left\{0,f^{\mathrm{num}}_{\varrho}\right\},\quad p_{i,i+1}=d_{i+1,i}=-\min\left\{0,f^{\mathrm{num}}_{\varrho}\right\}

for i=1,,N1,i=1,\dotsc,N-1, and we take periodic boundary conditions into account for the terms if i=Ni=N. The study of flux-balanced MPRK schemes introduced in [IMST2026] is left for future works. In Figure 5 we solve the Riemann problem (RP)

𝐮L=(0.8,103),𝐮R=(1,102)\mathbf{u}_{L}=(0.8,10^{-3}),\quad\mathbf{u}_{R}=(1,10^{-2})

with periodic boundary conditions and final time tend=1t_{\mathrm{end}}=1. We note that one should not use an entropy conservative flux for an RP, however, this is a good example that our time integrator maintains the entropy properties of the space discretization.

Refer to caption
Refer to caption
Refer to caption
Figure 5: Numerical solution of isothermal Euler equations with Nx=100N_{x}=100 using MPRK22adap(1) with rtol=atol=103\texttt{rtol}=\texttt{atol}=10^{-3} and Δt=Δx\Delta t=\Delta x. Left: without relaxation. Right: with relaxation.

4.6 Porous Medium Equation

The porous medium equation

ut=(um)xx=(a(u)ux)x,a(u)=mum1u_{t}=(u^{m})_{xx}=(a(u)u_{x})_{x},\quad a(u)=mu^{m-1}

with a free parameter m>1m>1, see for instance [Boscarino23], admits a non-negative weak solution

u(m)(t,x)=tk[max(1k(m1)2m|x|2t2k,0)]1m1u^{(m)}(t,x)=t^{-k}\left[\max\left(1-\frac{k(m-1)}{2m}\frac{\lvert x\rvert^{2}}{t^{2k}},0\right)\right]^{\frac{1}{m-1}}

with k=1m+1k=\frac{1}{m+1}, the so-called Barenblatt solution [Barenblatt52]. For every t>0t>0, the solution has a compact support [αm(t),αm(t)][-\alpha_{m}(t),\alpha_{m}(t)] where

αm(t)=2mk(m1)tk.\alpha_{m}(t)=\sqrt{\frac{2m}{k(m-1)}}t^{k}.

We follow [Boscarino23] using u(0,x)=u(m)(1,x)u(0,x)=u^{(m)}(1,x) as an initial condition. We plot the numerical solution at time t=2t=2 on the spatial domain [6,6][-6,6] using the boundary conditions u(t,±6)=0u(t,\pm 6)=0 for t>1t>1.

We use the second-order space discretization from [Mattsson12, ranocha2019mimetic] given by

fi(𝐮(t))=\displaystyle f_{i}(\mathbf{u}(t))= a(ui(t))+a(ui+1(t))2Δx2ui+1(t)\displaystyle\frac{a(u_{i}(t))+a(u_{i+1}(t))}{2\Delta x^{2}}u_{i+1}(t)
a(ui1(t))+2a(ui(t))+a(ui+1(t))2Δx2ui(t)\displaystyle-\frac{a(u_{i-1}(t))+2a(u_{i}(t))+a(u_{i+1}(t))}{2\Delta x^{2}}u_{i}(t)
+a(ui1(t))+a(ui(t))2Δx2ui1(t)\displaystyle+\frac{a(u_{i-1}(t))+a(u_{i}(t))}{2\Delta x^{2}}u_{i-1}(t)

for i=2,,Ni=2,\dotsc,N and

fj(𝐮(t))=a(uj(t))2Δx2uj(t), for j{1,N}.f_{j}(\mathbf{u}(t))=\frac{a(u_{j}(t))}{2\Delta x^{2}}u_{j}(t),\quad\text{ for }\quad j\in\{1,N\}.

Next, we consider the convex entropy

η(𝐮)=Δx22i=1Nxui2,\eta(\mathbf{u})=\frac{\Delta x^{2}}{2}\sum_{i=1}^{N_{x}}u_{i}^{2},

which satisfies

ddtη(𝐮(t))0\displaystyle\frac{\mathrm{d}}{\mathrm{d}t}\eta(\mathbf{u}(t))\leq 0

for the boundary conditions mentioned above, see [ranocha2019mimetic, Theorem 4.1]. This system of ODEs may be rewritten as a conservative PDS by setting

pi,i+1(𝐮)\displaystyle p_{i,i+1}(\mathbf{u}) =a(ui)+a(ui+1)2Δx2ui+1,\displaystyle=\frac{a(u_{i})+a(u_{i+1})}{2\Delta x^{2}}u_{i+1}, pi,i1(𝐮)\displaystyle\quad p_{i,i-1}(\mathbf{u}) =a(ui1)+a(ui)2Δx2ui1,\displaystyle=\frac{a(u_{i-1})+a(u_{i})}{2\Delta x^{2}}u_{i-1}, i\displaystyle\quad i =2,,N,\displaystyle=2,\dotsc,N,
p1,2(𝐮)\displaystyle p_{1,2}(\mathbf{u}) =a(u2)2Δx2u2,\displaystyle=\frac{a(u_{2})}{2\Delta x^{2}}u_{2}, pN,N1(𝐮)\displaystyle\quad p_{N,N-1}(\mathbf{u}) =a(uN1)2Δx2uN1,\displaystyle=\frac{a(u_{N-1})}{2\Delta x^{2}}u_{N-1}, di,j\displaystyle\quad d_{i,j} =pj,i.\displaystyle=p_{j,i}.

According to [Boscarino23], the cases m=3,5m=3,5 are particularly interesting as the numerical solution of the proposed third-order IMEX method in [Boscarino23, p. 10, eq. (30)] generates negative approximations and which cannot happen with MPRK schemes. Indeed, we observe in Figure 6 that we obtain positive approximations while the relaxation algorithm gives us an entropy estimate. Here, we do not plot γ\gamma as it was constantly at 11.

Refer to caption
(a) m=3m=3
Refer to caption
(b) m=5m=5
Refer to caption
(c) m=3m=3
Refer to caption
(d) m=5m=5
Figure 6: Numerical solution of PME equation with Nx=160N_{x}=160 using MPSSPRK22(0.5,1) (left) and MPRK43(0.5,0.75) (right) with Δt=Δx\Delta t=\Delta x and relaxation.

5 Summary and conclusions

In this work we investigated non-standard additive Runge–Kutta (NSARK) schemes, which include modified Patankar (MP) methods or Geometric Conservative (GeCo) to name a few. Being particularly interested in positivity-preserving methods that are also capable of conserving at least one linear invariant, we answered the question of whether these schemes can be equipped with a relaxation technique that preserves these properties while ensuring entropy stability. We point out that positivity preservation is easy to accomplish for entropy dissipative problems. For entropy conservative problems, where no linear invariant needs to be preserved, one can equip an unconditionally positive method with the geometric mean to compute the relaxation update. If the conservative problem has a linear invariant or one is interested in keeping a conservative PDS part also conservative within the relaxation procedure, we propose to use a linearly implicit formula for the relaxation update, which in turn results in a coupled linear-nonlinear system for the simultaneous computation of γ\gamma and 𝐮n+γ\mathbf{u}^{n+\gamma}. All techniques can be used for any positivity-preserving method maintaining the order, however, the latter relaxation technique involves a bootstrapping algorithm to achieve higher-order entropy conservative methods preserving a linear invariant.

We have tested our theoretical findings by means of multiple examples of ordinary and partial differential equations. Furthermore, interpreting a linear invariant as entropy, we were able to preserve both linear invariants of the stiff stratospheric reaction problem using MPRK. We have also tested several flux and entropy pairs for the linear advection equation testing the different iterative solvers for the computation of γ\gamma and 𝐮n+γ\mathbf{u}^{n+\gamma}. Moreover, we applied our technique also in the context of the isothermal Euler equation guaranteeing the positivity of the density. Finally, we have also tested MPRK and MPSSPRK schemes with the entropy dissipative porous medium equation, where we are also able to avoid negative approximations.

Future research topics include the testing of further NSARK schemes, including the recently developed flux-balanced versions, and the efficiency of the related methods. As some of the NSARK schemes are already proven to be conditionally stable, a thorough stability analysis for these methods is also part of ongoing research.

\bmhead

Acknowledgements

Declarations

Funding

T. Izgin gratefully acknowledges the financial support by Fulbright Germany. H. Ranocha was supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation, project number 513301895) and the Daimler und Benz Stiftung (Daimler and Benz foundation, project number 32-10/22). C.-W. Shu acknowledges partial support from NSF grant DMS-2309249.

Conflict of interest

Not applicable.

Ethics approval and consent to participate

Not applicable.

Consent for publication

All authors consent to submit for publication.

Data, Materials and Code availability

The source code used in this study is available at [IRS2026repository].

Author contribution

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Thomas Izgin. The first draft of the manuscript was written by Thomas Izgin and all authors commented on as well as wrote on previous versions of the manuscript. All authors read and approved the final manuscript.

References

BETA