Introduction

Differential Privacy (DP) is a technique used for publishing database queries that conceal confidential attributes. Some of its real-world applications are in the publication of the United States of America Census 2020 data (using disclosure avoidance system Census 2020), Google’s historical traffic statistics (Eland 2015), Microsoft’s telemetry data (Ding et al. 2017), LinkedIn user engagement information to third parties for advertisements (Rogers et al. 2020), etc.

DP hinges on a randomized mechanism wherein the data publisher, who owns the database, responds to analyst queries. The key principle is to generate similar distributions of query answers for data differing by a specific attribute, making it statistically challenging to discern whether data with that attribute were involved in the query computations.

In the existing literature, randomizing query answers commonly involves adding noise with a parametric distribution featuring one or two degrees of freedom. In contrast, this paper proposes a novel approach: optimizing all the parameters of the probability mass function (PMF) for queries with finite discrete answers. This ensures that the randomized query outcome meets DP constraints while minimizing expected distortion for a given database and discrete finite set of answers. Notably, existing literature does not undertake the optimization of noise to minimize distortion under an arbitrary \((\epsilon , \delta )\) constraint, as it typically limits the mechanism to a single parameter. Our approach stands out because it optimizes the entire distribution.

Our formulation, applicable to both discrete numerical data (for which perhaps mean squared error (MSE) is the best error metric) and categorical data (where error rate (ER) is preferable), finds resonance in real-world scenarios. For instance, queries related to discrete numerical data include: (a) The number of households in a census tract with at least one college-educated member (0 through n, where n is the total number of households in that census tract); (b) what is the most popular promotion on a website; or how many users in a certain set accepted a sales promotion on a website; (c) The hour of peak electricity usage in a neighborhood (00 through 23), etc. Similarly, queries related to categorical data are: (a) Type of consumer? (Subscriber or Trial user, Residential or Commercial, etc.); (b) What month of the year? (January: 1 through December: 12); (c) What gender is a person? (Male: 1, Female: 2, Other: 3), same idea for ethnicity, blood type, etc. For discrete numerical data, just perform the modulo addition of random noise of size \(n+1\), and for categorical data, we assign numerical values to categories and perform the modulo addition of random noise on them.

Before outlining our contributions, we review the relevant literature.

Literature review

In the literature, several papers studied the additive noise mechanisms for discrete query outputs (Geng and Viswanath 2015; Soria-Comas and Domingo-Ferrer 2013; Geng and Viswanath 2015). For discrete queries with infinite support, the additive noise mechanism for \(\epsilon -\)differential privacy that minimizes any convex function of the query error was found in Geng and Viswanath (2015); the optimum PMF is shown to have a specific decreasing staircase trend. The problem of finding the optimal data-independent noise mechanism for \(\epsilon -\)differential privacy is also addressed in Soria-Comas and Domingo-Ferrer (2013). Even though the authors focus on continuous query outputs, they claim one can easily extend the method to discrete queries. Neither paper (Geng and Viswanath 2015; Soria-Comas and Domingo-Ferrer 2013) explored the optimization of the \((\epsilon ,\delta )-\)differential privacy trade-off for \(\delta >0\). For integer query outputs, the optimal noise mechanism design for \((\epsilon ,\delta )-\)differential privacy is the subject of Geng and Viswanath (2015). Another approach to integer count queries is carried out in Ghosh et al. (2009), where a double-sided geometric distribution noise mechanism is used. A recent study on the count query DP problem is found in Cormode et al. (2019), in which the authors use a set of constrained mechanisms that achieve favorable DP properties. The related problem of publishing the number of users with sensitive attributes from a database is addressed in Sadeghi et al. (2020). In their proposed DP mechanism, they add an integer-valued noise before publishing it to protect the privacy of individuals. Though the randomized query response, produced by the proposed mechanism in Sadeghi et al. (2020), lies in the actual query support range, the additive noise PMF used depends on the query output, which requires storing several PMFs. In the context of discrete queries, an additive discrete Gaussian noise-based mechanism is proposed in Canonne et al. (2020). They show that the addition of discrete Gaussian noise provides the same privacy and accuracy guarantees as the addition of continuous Gaussian noise. Another recent study focuses on the mechanisms of discrete random noise addition (Qin et al. 2022). In this study, the basic DP conditions and properties of general discrete random mechanisms are investigated. In Ravi et al. (2022) a randomized mechanism for the labels obtained from K-means clustering is provided using the modulo addition-based mechanism.

In the literature, a joint DP mechanism is proposed for key-value data in Ye et al. (2019), where key uses categorical data and value uses numerical data. Two potential applications have been identified: video ad performance analysis and mobile app activity analysis. The key in the former is the ad identifier, and the value is the time a user has watched this video ad, whereas the key in the latter is the app identifier, and the value is the time or frequency this app appears in the foreground. In another work (Wang et al. 2019), local DP mechanisms for multidimensional data that contain both numeric and categorical attributes are proposed.

Paper contributions

This paper critically revisits the design of DP random mechanisms, specifically focusing on ensuring \((\epsilon ,\delta )-\)DP for queries with \(n+1\) possible answers, each mapped onto the integers 0 through n. The mechanism we study involves the modulo \(n+1\) addition of noise. The significant and novel contributions of this paper can be summarized as follows:

  • Optimal Noise PMF: In Section “Numerical optimization”, we demonstrate that the additive noise PMF minimizing a linear error metric under a given \((\epsilon , \delta )\) budget can be obtained as the solution of a Mixed Integer Linear Program (MILP). Notably, for the case when \(\delta =0\), the optimum PMF can be found using a Linear Program (LP), as established in previous literature (Geng et al. 2015), which is a special case of our general formulation.

  • Explicit PMF expressions for minimum error: Sections “PMF for single distance neighborhood” and “The BD neighborhood” delve into two special cases, providing explicit expressions for the optimum PMF that minimizes error for specific \((\epsilon ,\delta )\) pairs. This analysis extends and subsumes prior work (Geng et al. 2015).

  • Structure of Optimum PMF and Error Rate: We unveil the structural characteristics of the optimum PMF and error rate functions. Specifically, the derived error rate function exhibits a piece-wise linear nature. Our findings reveal that the optimum \((\epsilon ,\delta )\) trade-off curve, for a given error rate, experiences an exponential decrease as \(\delta\) increases. Moreover, a discrete set of discontinuities in the curve precludes any change in the exponential rate of decay as \(\epsilon\) increases.

  • Numerical Validation and Comparative Analysis: The contributions outlined above are corroborated by a thorough numerical analysis presented in Section “Numerical results”. Our simulations include comprehensive comparisons with prior methods, demonstrating the superiority of our proposed approach.

Notation

In this section, we present a summary of general notations used throughout the paper. A comprehensive list of notations and abbreviations can be found at the top of the paper in the Nomenclature section.

Let \(\mathbb {N}, \mathbb {N}^+, \mathbb {Z}, \mathbb {R}\) denote the sets of natural numbers including zero, natural numbers excluding zero, integers, and real numbers, respectively. The set integers \(\{0,1, \ldots , n\}\), \(n \in \mathbb {N}\), is referred to as [n], \([n]_{+}\) is, instead, \(\{1, \ldots , n\}\). The symbols \(\lfloor k \rfloor\) and \(\lceil k \rceil\) denote the floor and ceiling functions of k, respectively. The cardinality of set \(\mathcal {A}\) is denoted by \(|\mathcal {A}|\), and the set difference of sets \(\mathcal {A}\) and \(\mathcal {B}\) is denoted as \(\mathcal {A} {\setminus } \mathcal {B}\).

The query function applied to data X from a database, denoted by \(\mathcal X\), is represented as \(\mathbbm {Q}(X)\), and \(\mathcal {Q}\) denotes a discrete finite set of query answers. In this paper, the query domain is discrete and finite, mapped onto the set [n] of size \(n+1\). The numerical outcome of the query is denoted by the variable \(q\in [n]\), while \(\tilde{q}\) represents the outcome after the randomized publication, with distribution \(f(\tilde{q}|X)\).

For vector queries with outcomes, \(\varvec{q}\), in a finite discrete domain, one can map the result onto the set [n], and hence, the optimization we propose applies. We use \(f(\eta )\) to represent the noise distribution \(f_\eta (\eta )\) whenever possible without confusing the reader.

Preliminaries

Definition 1

(\((\epsilon ,\delta )\)-Differential Privacy (DP) Dwork et al. 2006) A randomized mechanism \(\tilde{q}:\mathcal {Q} \rightarrow \mathcal {Q}\) is \((\epsilon ,\delta )\)-differentially private if for all datasets X and \(X'\) differ by a unique data record, given any arbitrary event \(\mathcal {S} \subseteq \mathcal {Q}\) pertaining to the outcome of the query, the randomized mechanism satisfies the following inequality

$$\begin{aligned} Pr(\tilde{q}(X) \in \mathcal {S}) \le e^\epsilon Pr(\tilde{q}(X') \in \mathcal {S}) + \delta , \end{aligned}$$
(1)

where \(Pr(\mathcal {A})\) denotes the probability of the event \(\mathcal {A}\) and the PMF used to calculate the events probability is \(f(\tilde{q}|X)\).

Conventionally, given the random published answer \(\tilde{q}\) in the differential privacy literature, the privacy loss function name is a synonym for the log-likelihood ratio:

$$\begin{aligned} L_{{X\!X'}}(\tilde{q})\triangleq \ln \frac{f_{\tilde{q}}(\tilde{q}|X)}{f_{\tilde{q}}(\tilde{q}|X')}, \end{aligned}$$
(2)

where \(X \in \mathcal {X}\) is the set of data used to compute the query and \(X'\) is an alternative set with a unique attribute or data point that is different. For each X we denote by \({\mathcal X}^{(1)}_{X}\) the neighborhood of set of X which contains all data sets \(X'\in \mathcal {X}\) that differ from X by a predefined sensitive attribute we want to conceal. Note that, if the event \(L_{{X\!X'}}(\tilde{q})<0\) under the experiment with distribution \(f(\tilde{q}|X)\) then, in classical statistics, the observer of the outcomes \(\tilde{q}\) will choose erroneously the alternative hypothesis that \(X'\) was queried (where the emission probability is \(f(\tilde{q}|X')\)) rather than X. By looking at the tail of the distribution for \(L_{{X\!X'}}(\tilde{q})>0\) under the distribution \(f(\tilde{q}|X)\), one can gain insights into how frequently the mechanism allows to differentiate X from \(X'\) with great confidence, leaking private information to the observer.

We now introduce the definition of \((\epsilon , \delta )\)—probabilistic differential privacy (PDP) we consider in this paper, which applies to any random quantity \(\tilde{q}\) for any given X:

Definition 2

(\((\epsilon , \delta )\)-Probabilistic DP Machanavajjhala et al. 2008) Consider random data that can come from a set of emission probabilities \(\tilde{q} \sim f(\tilde{q}|X)\) that change depending on \(X\in \mathcal {X}\). The data \(\tilde{q}\) are \((\epsilon , \delta )\)- probabilistic differentially private \(\forall X \in \mathcal {X}\) and \(X^{'}\in \mathcal {X}^{(1)}_{X}\), iff:

$$\begin{aligned} \delta&\ge \delta ^{\epsilon }_{\tilde{q}} \triangleq \sup _{X\in \mathcal {X}} \sup _{X'\in \mathcal {X}_{X}^{(1)}} Pr \left( L_{{X\!X'}} (\tilde{q})>\epsilon \right) , \end{aligned}$$
(3)

and the PMF used to calculate the probability is \(f(\tilde{q}|X)\).

The following theorem guarantees that \((\epsilon ,\delta )\)-PDP is a strictly stronger condition than \((\epsilon ,\delta )\)-DP.

Theorem 1

(PDP implies DP McClure 2015) If a randomized mechanism is \((\epsilon ,\delta )\)-PDP, then it is also \((\epsilon ,\delta )\)-DP, i.e.,

$$\begin{aligned} (\epsilon ,\delta )-\text {PDP} \! \Rightarrow \! (\epsilon ,\delta )-\text {DP}, \text {but } (\epsilon ,\delta )-\text {DP} \! \nRightarrow \! (\epsilon ,\delta )-\text {PDP}. \end{aligned}$$

The proof of Theorem 1 is shown in McClure (2015), Triastcyn and Faltings (2020). This motivates us to use \((\epsilon ,\delta )\)—PDP throughout this paper. The \((\epsilon ,\delta )\)—PDP has applications in the contexts of location recommendations Zhang and Chow (2021), privacy protection from sampling and perturbations (Sholmo and Skinner 2012), to create a realistic framework for statistical agencies to distribute their goods (Machanavajjhala et al. 2008), etc.

In our setup \(\mathbbm {Q}(X)\) is a scalar value in \({\mathcal Q}\equiv [n]\). The query response \(\tilde{q}\) is obtained by adding a discrete noise \(\eta\), whose distribution is denoted by \(f(\eta )\), i.e.:

$$\begin{aligned} \tilde{q}=\mathbbm {Q}(X)+{\eta }\,\,\Rightarrow \,\,f_{\tilde{q}}(\tilde{q}|X)=f_{\eta }(\tilde{q}-\mathbbm {Q}(X)). \end{aligned}$$
(4)

The PMF associated with the privacy loss function, called privacy leakage probability, for the additive noise mechanism, can be derived from  (2), (3) and (4):

$$\begin{aligned} Pr(L_{{X\!X'}}(\tilde{q})>\epsilon )=Pr\left( \ln \frac{f_{\eta }(\tilde{q}-\mathbbm {Q}(X))}{f_{\eta }(\tilde{q}-\mathbbm {Q}(X'))}>\epsilon \right) . \end{aligned}$$
(5)

For the discrete query case, we denote the “distance one set” of \(X \in \mathcal {X}\) as \(\mathcal {X}_{X}^{(1)} \subset {\mathcal X} {\setminus } X\) and let:

$$\begin{aligned} \mu _{{X\!X'}}\triangleq \mathbbm {Q}(X)-\mathbbm {Q}(X'), \,\,\,\,\, \forall X \in \mathcal {X},\forall X'\in \mathcal {X}_{X}^{(1)}. \end{aligned}$$
(6)

where \(X'\) differs from X for one user data record or a sensitive user attribute. Let us define the indicator function \(u_{{X\!X'}} (\eta )\), \(\eta \in [n]\) such that \(i^{th}\) entry is one if \(L_{{X\!X'}}(\tilde{q}_i)>\epsilon\) and zero otherwise, i.e.:

$$\begin{aligned} u_{{X\!X'}} (\eta ) \triangleq&{\left\{ \begin{array}{ll} 1, &{} \,\,\,\,\,f(\eta ) > e^\epsilon f(\eta + \mu _{{X\!X'}})\\ 0, &{}\,\,\,\,\,\,\hbox {otherwise} \end{array}\right. } \end{aligned}$$
(7)

where we omitted the suffix \(\eta\) and used \(f(\eta )\) in lieu of \(f_{\eta }(\eta )\). It is easy to verify that we have:

$$\begin{aligned} Pr(L_{{X\!X'}}(\tilde{q})>\epsilon )=\sum _{\eta =0}^{n} u_{{X\!X'}} (\eta ) f(\eta ). \end{aligned}$$

Before describing our design framework in Section “Optimal additive noise”, a few considerations on \((\epsilon ,\delta )-PDP\) are in order.

Post-processing

A randomized DP mechanism maps a query output onto a distribution designed to meet Definition (1) or (3). If the randomized query answer generation requires multiple steps, it is important to ensure that the \((\epsilon ,\delta )-PDP\) (\((\epsilon ,\delta )-DP\)) are met after the very last step. In fact, in Meiser (2018) it was pointed out that, unless \(\delta =0\), in general \((\epsilon ,\delta )-PDP\) with \(\delta >0\) cannot be guaranteed after post-processing, where post-processing refers to processing steps that follow the noise addition prior to the release the query response. The objections in Meiser (2018) are valid for mechanisms that include post-processing like the popular “truncation” or “clamping” mechanisms that consists of first adding unbounded noise \(\eta\) and then projecting (clamping) the sum \(\mathbb {Q}(X)+\eta\) in the prescribed range to generate \(\tilde{q}\). In this case \((\epsilon , \delta )-DP\) are guaranteed before the post-processing step, but not after, unless \(\delta =0\). The proposition below provides guarantees for \((\epsilon ,\delta )-{PDP}\).

Proposition 1

Let \(\tilde{q}\in \mathcal {Q}\) be a randomized \((\epsilon ,\delta )-\text {PDP}\) response for query \(q \in {\mathcal Q}\). Let \(g: \mathcal {Q} \rightarrow \mathcal {Q}\) be an arbitrary invertible mapping. Then \(g \circ \tilde{q}= g(\tilde{q})\) is also a \((\epsilon ,\delta )-{PDP}\) response for any \(\delta \ge 0\). Furthermore, if \(\delta =0\), \(\epsilon -{PDP}\) is preserved, irrespective of g.

Proof

The proof is in Appendix 6. \(\square\)

Proposition 1 clarifies the importance of designing randomized responses whose domain is consistent with the query output, since it does not require post-processing to generate answers in the right set. Clamping is not bijective and changes the masses of probability in a way that alters \(\delta\) for a given \(\epsilon\).

Optimal additive noise

For queries \(q = \mathbbm {Q}(X) \in [n]\), a possible approach other than clamping is to assume that the noise addition is modulo \(n+1\) with \(\eta \in [n]\) so that the outcome \(\tilde{q}=q + \eta \,(\text {mod } n+1)\), is always in the appropriate range. In this paper, we seek to obtain the optimum noise distribution \(f(\eta )\) for such a mechanism. Since \((\epsilon , \delta )-PDP\) implies \((\epsilon , \delta )-DP\) and hence, it is a stronger notion, we use the definition of \((\epsilon , \delta )-PDP\) throughout.

Next, we omit the \((\text {mod } n+1)\) to streamline the notation, with the understanding that, from now on, sums and differences of query outcomes and noise values are always modulo \(n+1\). Observe that adding uniform noise would lead necessarily to a scalar query \({\tilde{q}}\) being uniform and thus, high privacy (i.e. \(\delta =0\) for any \(\epsilon >0\)) but poor accuracy since \(1-1/(n+1)\). This motivates the search for an optimal solution. Using (5) and (6):

$$\begin{aligned} Pr(L_{{X\!X'}}(\tilde{q})>\epsilon )&=Pr\left( \ln \frac{f(\eta )}{f(\eta + \mu _{{X\!X'}})}>\epsilon \right) . \end{aligned}$$
(8)

The reasons for using the modulo addition of noise are:

  • The randomized answers fall within the range expected for the query, which allows us to leverage Proposition 1.

  • The mechanisms require defining a single distribution rather than distinct distributions for all possible \(X\in \mathcal {X}\).

  • In the optimization, any pair with the same modulo difference results in a single \((\epsilon , \delta )-PDP\) constraint, simplifying the search for the optimum distribution.

  • The simplifications allow us to derive the optimum distribution in closed form for specific use cases.

From (8) it is evident that the \((\epsilon ,\delta )\) privacy curve is entirely defined by the noise distribution and its change due to a shift in the mean. As a result, the probability mass \(f(q+\eta )\) is obtained as a circular shift of the PMF \(f(\eta )\); therefore (8) can be used with the denominator \(f(\eta +\mu _{{X\!X'}})\) also representing a circular shift of \(f(\eta )\). This result motivates us to define the neighborhood sets, using only \(\mu _{{X\!X'}}\), in Section “Analytical solutions”.

Numerical optimization

In this section, we show that the problem of finding an optimal additive noise mechanism for a given pair \((\epsilon ,\delta )\) and expected distortion cost can be cast into a MILP formulation, i.e. an optimization problem with linear cost, linear equality and inequality constraints, and real as well as integer variables. While MILPs are non-convex, several stable solvers have convergence guarantees. Our MILP formulation (in (13a)–(13g)) finds optimum noise distributions minimizing a specified expected distortion cost:

$$\begin{aligned} \mathbb {E}[\rho _{\eta }]= \sum _{\eta =0}^n \rho _{\eta } f(\eta ), \end{aligned}$$
(9)

where \(\rho _{\eta }\) denotes the distortion caused by the noise value \(\eta \in [n]\). There are two typical metrics:

Definition 3

(Error Rate) For \(\tilde{q}=\mathbbm {Q}(X)+ \eta\), the error rate metric, denoted by \(\rho ^{\text {ER}}\), is the expected value of the function \(\rho _0=0\) and \(\rho _{\eta }=1, \eta >0\). Thus:

$$\begin{aligned} \rho ^{\text {ER}} \triangleq 1 - f(0). \end{aligned}$$
(10)

Definition 4

(Mean Squared Error) For \(\tilde{q}=\mathbbm {Q}(X) + \eta\), the MSE corresponds to \(\rho _{\eta }=\eta ^2\):

$$\begin{aligned} \rho ^{\text {MSE}} \triangleq \mathbb {E}[|\tilde{q}-\mathbbm {Q}(X) |^{2}] = \sum _{\eta =1}^{n} \eta ^{2} f(\eta ). \end{aligned}$$
(11)

Remark 1

In the case of an ‘ordered’ query domain, the MSE metric can be preferable over an error rate metric. That is why we are focusing on the minimization of any linear cost in the MILP formulation. However, the numerical results shown in Section “Numerical results” indicate that even when we target the error rate metric \(\rho ^{ER} = 1 - f(0)\), the optimal solution tends to have values that diminish as they move away from the actual query.

Because our analytical results in Section “Analytical solutions” consider the error rate metric, whenever \(\rho\) is mentioned without specification, this implies \(\rho ^{ER}\) is being discussed. Having established the cost, the constraints (see (13b)–(13h)) are derived as follows. From the databases, we calculate the set \(\{\mu _{{X\!X'}}\}\) using (6), which are the only database parameters needed in the formulation. As we know, the sum of probability masses is 1 (see (13b)). The indicator function \(u_{{X\!X'}}(\eta )\), defined in (7), can be mapped on the integrality constraint, \(u_{{X\!X'}} (\eta ) \in \{0,1\}, \forall X \in \mathcal {X},\forall X'\in \mathcal {X}_{X}^{(1)}\) (see 13h) and on two linear inequality constraints, \(f(\eta ) - e^\epsilon f(\eta +\mu _{{X\!X'}}) - u_{{X\!X'}} (\eta ) \le 0\) and \(e^\epsilon f(\eta +\mu _{{X\!X'}}) - f(\eta ) + e^{\epsilon }u_{{X\!X'}} (\eta ) \le e^{\epsilon }\) (see (13f) and (13g)).Footnote 1 We can rewrite (8) as \(Pr(L_{{X\!X'}}(\tilde{q})>\epsilon ) = Pr(f(\eta ) > e^\epsilon f(\eta + \mu _{{X\!X'}}))= \sum _{\eta =0}^n u_{{X\!X'}} (\eta ) f(\eta ) \le \delta\), \(\forall X \in \mathcal {X},\forall X'\in \mathcal {X}_{X}^{(1)}\). Since it is a bilinear constraint, and for a MILP formulation we need the constraints to be linear, we introduce the auxiliary variables \(y_{\eta }, \eta \in [n]\):

$$\begin{aligned} y_{\eta }&\triangleq u_{{X\!X'}} (\eta ) f(\eta ), \,\,\,\,\eta \in [n]. \end{aligned}$$
(12)

so that we can use \(\sum _{\eta =0}^n y_{\eta } \le \delta\) to constrain \(\delta\) instead; to enforce \(y_{\eta }=f(\eta )\) for \(u_{{X\!X'}}(\eta )=1\), and \(y_{\eta }=0\) for \(u_{{X\!X'}}(\eta )=0\), the trick is to use, respectively, the following linear constraints: \(u_{{X\!X'}} (\eta ) + f(\eta ) - y_{\eta } \le 1\), \(u_{{X\!X'}} (\eta ) -f(\eta ) + y_{\eta } \le 1\),Footnote 2 and \(y_{\eta } - u_{{X\!X'}} (\eta ) \le 0\).Footnote 3 This completes the explanation of the optimization constraints shown in (13c)–(13e).Footnote 4 Let

$$\begin{aligned} \mathcal{M}\!\triangleq \!\{\mu _{{X\!X'}}| \mu _{{X\!X'}}=\mathbb {Q}(X)-\mathbb {Q}(X'),\forall X\!\!\in \!\! \mathcal {X},\forall X'\!\!\in \!\! \mathcal {X}_{X}^{(1)}\} \end{aligned}$$

The form of the MILP is:

$${\min _{f(\eta ),u_{{X\!X'}}(\eta ), y_{\eta }} \,\,\sum _{\eta =0}^{n}\rho _{\eta }f(\eta )}$$
(13a)
$$\text {s.t.}\,\sum _{\eta =0}^n f(\eta )= 1,$$
(13b)
$$\sum _{\eta =0}^n y_{\eta } \le \delta , \,\,\,\,\,\,y_{\eta } - u_{{X\!X'}} (\eta ) \le 0,$$
(13c)
$$u_{{X\!X'}} (\eta ) + f(\eta ) - y_{\eta } \le 1,$$
(13d)
$$u_{{X\!X'}} (\eta ) -f(\eta ) + y_{\eta } \le 1,$$
(13e)
$$f(\eta ) - e^\epsilon f(\eta +\mu _{{X\!X'}}) - u_{{X\!X'}} (\eta ) \le 0,$$
(13f)
$$e^\epsilon f(\eta +\mu _{{X\!X'}}) - f(\eta ) + e^{\epsilon }u_{{X\!X'}} (\eta ) \le e^{\epsilon },$$
(13g)
$$\begin{aligned} & f(\eta ),\,y_{\eta } \in [0, 1], \,\eta \in [n], \nonumber \\&\,\,\,\,\,\,u_{{X\!X'}} (\eta ) \in \{0,1\}, \forall \mu _{{X\!X'}} \in \mathcal {M}. \end{aligned}$$
(13h)

The MILP has \(3(n+1)\) variables out of which \(f(\eta )\) and \(y_{\eta }\), \(\eta \in [n]\) are real numbers (\(2(n+1)\) in total) in [0, 1] and \(u_{{X\!X'}} (\eta )\) (also \((n+1)\) in number) are integers \(\{0,1\}\). The computational complexity of the MILP solution is determined by the algorithm used, which can be branch and bound, cutting plane, or branch and cut. A detailed study on the complexity of the MILP solution is found in Jiang et al. (2021). When \(\delta = 0\), it reduces to the following LP:

$$\min _{f(\eta ),u_{{X\!X'}}(\eta ), y_{\eta }} \sum _{\eta =0}^{n}\rho _{\eta }f(\eta )$$
(14a)
$$\text {s.t.}\,\sum _{\eta =0}^n f(\eta )= 1,$$
(14b)
$$f(\eta ) - e^\epsilon f(\eta +\mu _{{X\!X'}})\le 0,\,\,\eta \in [n],$$
(14c)
$$f(\eta ) \in [0,1], \,\forall \mu _{{X\!X'}} \in \mathcal {M}$$
(14d)

A possible useful variant of the optimization in () that will be explored in our numerical results is to minimize \(\delta\) instead, under a distortion constraint \(\overline{\rho }\), i.e.:

$$\min _{f(\eta ),u_{{X\!X'}}(\eta ), y_{\eta }} \delta$$
(15a)
$$\text {s.t.}\,\sum _{\eta =0}^{n}\rho _{\eta }f(\eta )\le \overline{\rho },$$
(15b)
$$(13{\text{b}}) - \cdots - (13{\text{h}})$$
(15c)

In the next sections, we derive analytical solutions of (13) that minimize the error probability \(\rho ^{ER} = 1 - f(0)\) for some special database structures and corroborate the results in Section “Numerical results” comparing the formulas with the MILP solutions obtained using  Gurobi (2021) as a solver.

Analytical solutions

In this section, we analytically study the solution that minimizes the error rate \(\mathbb {E}[\rho _{\eta }]=1-f(0)\). To give closed-form solutions for the optimum PMF, we focus on the following instances of possible \(\mu _{{X\!X'}}\):

Definition 5

(Single Distance (SD)) In this setting \(\forall X \in \mathcal {X},\forall X'\in {\mathcal X}_{X}^{(1)}\), the difference \(\mu _{{X\!X'}}\) is constant, i.e. \(\mu _{{X\!X'}} = \hat{\mu }\). Note that \(\hat{\mu } \le n\).

Definition 6

(Bounded Difference (BD)) In this setting \(\forall X \in \mathcal {X},\forall X'\in \mathcal {X}_{X}^{(1)}\), \(\mu _{{X\!X'}}\) take values in the set \([\bar{\mu }]_+\), \(\bar{\mu } \le n\), at least once.

The most general case is the following:

Definition 7

(Arbitrary) In this case \(\mu _{{X\!X'}}\) can take values from any subset of [n],\(\,\forall X \in \mathcal {X},\forall X'\in \mathcal {X}_{X}^{(1)}\).

The next lemma clarifies that the optimum PMF for the BD case for a given \((\epsilon ,\delta )\) is useful to attain the same guarantees for the case of an arbitrary neighborhood.

Lemma 1

Suppose \(\bar{\mu }=\sup _{\forall X \in \mathcal {X},\forall X'\in \mathcal {X}_{X}^{(1)}}\mu _{{X\!X'}}\). Then any noise PMF that provides \((\epsilon ,\delta )\) privacy for the BD neighborhood with parameter \(\bar{\mu }\) will give the same guarantees in terms of \((\epsilon ,\delta )\) and \(\rho\) for the case of arbitrary neighborhoods. However, lower distortion is achievable by solving the MILP in (13).

Proof

The proof is simple: the set of constraints that need to be met to satisfy \((\epsilon ,\delta )\) privacy for the arbitrary case is a subset of the BD neighborhood case with \(\bar{\mu }\) as a parameter. This also means, however, that the minimum \(\rho ^\star\) for the latter case is sub-optimum. \(\square\)

Note that the SD neighborhood setting is a simple case while the previous lemma indicates that the BD case is more useful in general.

Next, we find an explicit solution for the optimum noise PMF \(f^\star (\eta ), \eta \in [n]\) for the SD and BD neighborhood cases. In Section “Optimal noise mechanism for vector queries”, we discuss the case of discrete vector queries whose entries are independently subjected to the mechanism vs. the optimal solution.

PMF for single distance neighborhood

The natural way to express the optimum PMF for an SD neighborhood setting is by specifying the probability masses sorted in decreasing order of f(k), \(\forall k \in [n]\). They are referred to as \(\{f_{(k)}| 1 \ge f_{(0)} \ge f_{(1)} \ge \cdots \ge f_{(n)} \ge 0\}\).

Lemma 2

Considering the case in Definition 5 where \(\mu _{{X\!X'}} = \hat{\mu }\) is constant \(\,\forall X \in {\mathcal X},\forall X'\in \mathcal {X}_{X}^{(1)}\), the noise PMF minimizing the error rate is such that \(f^\star _{(h)} = f^\star (h\hat{\mu })\), \(\forall h \in [n]\) and the inequality in (14c), can be written in terms of the sorted PMF as follows:

$$f^\star _{(h)} - e^\epsilon f^\star _{(h+1)}\le 0,\,\,\forall h \in [n].$$
(16)

Proof

The proof is in Appendix 7. \(\square\)

To start, let us consider the case \(\delta = 0\):

Lemma 3

For the SD neighborhood and \(\delta = 0\), the optimal noise PMF for the modulo addition mechanism is:

Case 1: If \((\hat{\mu }, (n+1))\) are relatively prime.

$$f_{(k)}^\star= e^{-k \epsilon } f_{(0)}^\star , \,\,\,\,k \in [n]_+,$$
(17a)
$$f_{(0)}^\star= \frac{1 - e^{-\epsilon }}{1 - e^{-(n+1)\epsilon }}\equiv f^\star (0).$$
(17b)

Case 2: If \((\hat{\mu }, (n+1))\) are not relatively prime.

$$f_{(k)}^\star \,= e^{-k \epsilon } f_{(0)}^\star , \,\,k \in [N_{\hat{\mu }}-1]_+$$
(18a)
$$f^\star _{(k)}= 0, \quad \quad \,\,\,\,\,k \in [n] {\setminus } [i\hat{\mu }], \,\forall i \in [N_{\hat{\mu }}-1]$$
(18b)
$$f_{(0)}^\star= \frac{1 - e^{-\epsilon }}{1 - e^{-(N_{\hat{\mu }})\epsilon }}\equiv f^\star (0).$$
(18c)

where \(N_{\hat{\mu }} = \frac{(n+1)}{\gcd ((n+1),\hat{\mu })}\) and \(\rho ^{\star } = 1-f^\star (0)\).

Proof

The proof is in Appendix 8. \(\square\)

Lemma 3 is verified numerically in Section “Numerical results” in both Case 1 and 2 (see Fig. 9). We illustrate the two cases in Fig. 1.

Fig. 1
Fig. 1
Full size image

In these examples \(f^\star {(\hat{\mu })}\) is single distance, \(\hat{\mu }\), away from \(f^\star {(0)}\) hence it is assigned \(e^{-\epsilon }f^\star {(0)}\), next \(f^\star {(2\hat{\mu })}\) is assigned \(e^{-\epsilon }f^\star {(\hat{\mu })}\) since it is \(\hat{\mu }\) away from \(f^\star {(\hat{\mu })}\) and so on and so forth. So the order of assignment of values for plot (a) example is: 0, 3, 6, 1, 4, 7, 2, 5 and the order of assignment of values for plot (b) example is: 0, 2, 4, 6. Since the values at 1, 3, 5, 7 are not \(\hat{\mu }\) away from \(f^\star {(2k\hat{\mu })}, k \in [N_{\hat{\mu }}-1=3]\) they are assigned 0 value to have a higher \(f^\star {(0)}\)

From (17b) we can observe that for case 1, the error rate \(\rho ^{\star }\) depends only on \(\epsilon\) and n, and for case 2 (see (18c)), \(\rho ^{\star }\) depends on both \(\epsilon\) and \(N_{\hat{\mu }}\).

Remark 2

It is notable that in this formulation where the cost is the error rate, positive probability masses corresponding to higher noise outcomes tend to be less likely than those having smaller outcomes. This is why these designs exhibit low MSE.

Mechanisms with better error rate (lower \(\rho\)) must allow for \(\delta > 0\). It can be proven (see Theorem 2) that the optimal error rate \(\rho ^{\star }(\delta ,\epsilon )\) vs. \(\delta\) curve is piece-wise linear, interleaving flat regions with intervals with linear negative slope, see Fig. 2.

Fig. 2
Fig. 2
Full size image

The variation of \(\rho\) as a function of \(\delta\) for SD neighborhood showing the alternate flat and linear regions

We categorize them as “linear regions” and “flat regions”. Let \(\underline{\delta }^{\epsilon }_k\) and \(\overline{\delta }^{\epsilon }_k\) be the instances of \(\delta\) at which \(\rho ^{\star }(\delta ,\epsilon )\) changes from kth linear region to kth flat region and kth flat region to \((k+1)\)th linear region, respectively.

Remark 3

We state the following results of this section, i.e., Section “PMF for single distance neighborhood”, for the case \((n+1, \hat{\mu })\) that are relatively prime. For the case where they are not, the results are obtained by replacing n with \(N_{\hat{\mu }} = \frac{(n+1)}{\gcd ((n+1),\hat{\mu })}\) in all the expressions and explanations.

Let us define the following quantities:

$$\begin{aligned} \overline{\delta }^{\epsilon }_k&:= e^{-(n-k-1)\epsilon }\frac{1-e^{-\epsilon }}{1-e^{-(n-k+1)\epsilon }}, \,\hbox {for }k\in [n\!-\!1] \end{aligned}$$
(19a)
$$\begin{aligned} \overline{\delta }^{\epsilon }_n&:=1, \,\,\,\, \underline{\delta }^{\epsilon }_0:=0 \end{aligned}$$
(19b)
$$\begin{aligned} \underline{\delta }^{\epsilon }_k&:= e^{-(n-k)\epsilon }\frac{1-e^{-\epsilon }}{1-e^{-(n-k+1)\epsilon }}, \,\hbox {for }k\in [n]_+. \end{aligned}$$
(19c)

Theorem 2

\(k\in [n]_+\), in the kth section where \(\rho ^\star (\delta ,\epsilon )\) is flat within \(\underline{\delta }^{\epsilon }_{k} \le \delta \le \overline{\delta }^{\epsilon }_k\):

$$\begin{aligned} f^\star _{(h)}&= {\left\{ \begin{array}{ll} \underline{\delta }^{\epsilon }_{k} e^{(n-k-h)\epsilon }, &{}h \in [n-k],\\ 0, &{}h \in [n-k+1:n], \end{array}\right. } \end{aligned}$$
(20)

In the portion of the kth section where \(\rho ^\star (\delta ,\epsilon )\) decreases linearly with \(\delta\), which are within \(\, \overline{\delta }^{\epsilon }_{k-1}< \delta \le \underline{\delta }^{\epsilon }_k\):

$$\begin{aligned} f^\star _{(h)}= {\left\{ \begin{array}{ll} \delta e^{(n-k-h)\epsilon }, &{}\!\!\!\!h \!\in \! [n\!-\!k],\\ e^{(n-h)\epsilon }\frac{e^{\epsilon }-1}{e^{k\epsilon }-1}\left( 1-\frac{\delta }{\underline{\delta }^{\epsilon }_k} \right) ,&{} \!\!\!\!h\!\in \! [n\!-\!k+1\!:\!n]. \end{array}\right. } \end{aligned}$$
(21)

and \(\rho ^\star (\delta ,\epsilon ) =1-f^\star _{(0)}\).

Proof

The proof is in Appendix 9. \(\square\)

From Theorem 2 we note that the boundary point \(\underline{\delta }^{\epsilon }_k\) determines the value of \(f^\star _{(n-k)}\), which is the smallest non-zero probability mass in the kth flat region. Similarly, the other boundary point \(\overline{\delta }^{\epsilon }_k \equiv e^{\epsilon } \underline{\delta }^{\epsilon }_k\) indicates the value of \(f^\star _{(n-k-1)}\), which is the smallest non-zero probability mass in the \((k-1)^{th}\) flat region. Having calculated the optimal PMF for the SD neighborhood case in Theorem 2, the \((\epsilon ,\delta )\) curves correspond to \(f^{\star }(0)=1-\rho ^{\star }\) for all its \(n+1\) possible expressions or, better stated, they are the level curves \(\rho (\delta , \epsilon )=\rho ^{\star }\). The trend of \(\delta\) curves is monotonically decreasing with respect to \(\epsilon\) for a given \(\rho ^{\star }\). Let \(\epsilon _0^{\rho ^{\star }}\) be the solution obtained setting \(f^{\star }(0)\) in (17b) to be equal to \(1-\rho ^{\star }\):

$$\begin{aligned} \epsilon _0^{\rho ^{\star }}:\,\,\rho ^{\star }=1-\frac{1-e^{-\epsilon _0^{\rho ^{\star }}}}{1-e^{-(n+1)\epsilon _0^{\rho ^{\star }}}} \end{aligned}$$
(22)

Then we must have that \(\delta =0\) for \(\epsilon \ge \epsilon _0^{\rho ^{\star }}\). Because this corresponds to a flat region for \(f^{\star }(0)\), there has to be a discontinuity moving towards lower values \(\epsilon <\epsilon _0^{\rho ^{\star }}\), and \(\delta\) must immediately jump to \(\overline{\delta }^{\epsilon _0^{\rho ^{\star }}}_0\) when \(\epsilon\) is an infinitesimal amount below \(\epsilon _0^{\rho ^{\star }}\). This point is the edge of the linear region. For a range of values of \(\epsilon <\epsilon _0^{\rho ^{\star }}\), \(\delta\) with respect to \(\epsilon\) must have the negative exponential trend \(\delta =(1-\rho ^{\star })e^{-n\epsilon }\) obtained from Eq. (21) for \(h=0,k=1\) until the next jump occurs, for a value \(\epsilon _1^{\rho ^{\star }}\) which is obtained by setting \(f^{\star }(0)\) for \(h=0\) and \(k=1\) in (20) to be equal to \(1-\rho ^{\star }\):

$$\begin{aligned} \epsilon _1^{\rho ^{\star }}:\,\,\rho ^{\star }=1-\underline{\delta }_1^{\epsilon _1^{\rho ^{\star }}} e^{(n-1)\epsilon _1^{\rho ^{\star }}}=1-\frac{1-e^{-\epsilon _1^{\rho ^{\star }}}}{1-e^{-n\epsilon _1^{\rho ^{\star }}}}. \end{aligned}$$
(23)

Following this logic, one can prove that the optimum \((\epsilon , \delta )\) curve for a given error rate \(\rho ^{\star }\) is:

Corollary 1

For a given \(\epsilon >0\), the privacy loss for the SD neighborhood case with the optimal noise mechanism, is a discontinuous function of \(\epsilon\), where:

$$\begin{aligned} \delta ^\epsilon = e^{-(n-k)\epsilon }(1-\rho ^{\star }),\,\,\,\epsilon ^{\rho ^{\star }}_{k} \le \epsilon <\epsilon ^{\rho ^{\star }}_{k-1} \end{aligned}$$
(24)

when \(\rho ^\star = 1 - f^\star _{(0)}\) is in kth section, \(k \in [n]_+\) and \(\epsilon _k^{\rho ^{\star }}\) are the solutions of the following equations:

$$\begin{aligned} \epsilon _k^{\rho ^{\star }}\!\!:\!\rho ^{\star }=1-\underline{\delta }_k^{\epsilon _k^{\rho ^{\star }}} e^{(n-k)\epsilon _k^{\rho ^{\star }}}=1-\frac{1-e^{-\epsilon _k^{\rho ^{\star }}}}{1-e^{-(n+1-k)\epsilon _k^{\rho ^{\star }}}}. \end{aligned}$$
(25)

Proof

As discussed before the Corollary, for \(\delta =0\), the level curves of \(\rho ^{\star }(\delta ,\epsilon )=\rho ^{\star }\) as a function of \(\epsilon\) must be monotonically decreasing for \(\epsilon \ge \epsilon _0^{\rho ^{\star }}\) (see (22)) as \(\epsilon\) increases. Then the curve will have discontinuities that correspond to the flat regions and the trend between these discontinuities is obtained through the equation \((1-\rho ^{\star })=f^{\star }(0)=\delta e^{(n-k)\epsilon }\), which implies \(\delta =(1-\rho ^{\star })e^{-(n-k)\epsilon }\). The kth interval starts at the point \(\epsilon =\epsilon _k^{\rho ^{\star }}\) such that \(\delta =\underline{\delta }_k^{\epsilon =\epsilon _k^{\rho ^{\star }}}\) ensures that \(f^{\star }(0)=1-\rho ^{\star }\); thus, \(\epsilon =\epsilon _k^{\rho ^{\star }}\) must be the solution of (25). \(\square\)

The BD neighborhood

Also in this case we first start with the optimum noise for \(\delta = 0\). First, let us express n as:

$$\begin{aligned} n = b\bar{\mu }+r \end{aligned}$$
(26)

where \(b = \lfloor {n/\bar{\mu }}\rfloor\) and \(r \in [\bar{\mu }-1]\). For this case, the inequalities in (14c) can be written as the following:

$$\begin{aligned} f^\star {(h)} - e^\epsilon f^\star {(h+\mu )}\le 0,\,\,\forall h \in [n], \,\forall \mu \in [\bar{\mu }]_+. \end{aligned}$$
(27)
Fig. 3
Fig. 3
Full size image

The PMF of the optimal noise mechanism for the BD neighborhood follows a staircase pattern. In the flat region, it has \(b+1\) steps and \(i^{th}\) step height is \(\phi ^k_i\). Similarly, in the linear region, the PMF has \(b+2\) steps and \(i^{th}\) step height is \(\psi ^k_i(\delta )\)

Lemma 4

In the case of a BD neighborhood of size \(\bar{\mu }\), the optimum PMF \(f^\star (\eta )\) has the following form for \(i\in [b+1]_+\):

$$\begin{aligned} f^\star (\eta )=f^\star (0)e^{-i\epsilon }\!\equiv \phi _i, \,\,(i\!-\!1)\bar{\mu }+1\le \eta \le \min (i\bar{\mu },n). \end{aligned}$$
(28)

and the mass at zero is:

$$\begin{aligned} f^\star (0)&=\left( 1+\bar{\mu }\sum _{i=1}^b e^{-i\epsilon }+ re^{-(b+1)\epsilon }\right) ^{-1}\equiv \phi _0. \end{aligned}$$
(29)

The PMF has, therefore, a staircase trend with steps of length \(\ell _i=\bar{\mu }\) for \(i\in [b]_+\) and \(\ell _{b+1}=r\) and \(\rho ^\star =1-\phi _0\).

Proof

The proof is similar to that of Lemma 3 because it recognizes that it is best to meet the inequalities in (27) as equalities, since that allows for the largest \(f^{\star }(0)\). The only difference is that the masses in the first group \([\bar{\mu }]_+\) are equal to \(\phi _1=e^{-\epsilon }f^{\star }(0)\), thus they constrain a second group to have value \(\phi _2=e^{-\epsilon }\phi _1\) and so on. There are b of them that contain \(\bar{\mu }\) masses of probability and only the last group includes the last r values. \(f^\star (0)\) is obtained normalizing the PMF to add up to 1. \(\square\)

The PMF for \(\delta >0\) has staircase pattern (see Fig. 3), similar to Lemma 4 and, also in this case, for \(\delta >0\) the \(\rho ^\star (\delta ,\epsilon )\) has a piece-wise linear trend that alternates between flat and linear regions. However, the BD neighborhood case has an intricate pattern in which the constraints become violations, as the privacy loss \(\delta \rightarrow 1\). Rather than having \(k=n\), the number of sections k is quadratic with respect to b. To explain the trend, it is best to divide the section of the \(f^{\star }(0) \equiv 1 - \rho ^\star (\delta ,\epsilon )\) curve vs. \(\delta\) in b segments, indexed by \(h \in [b]_+\) as shown in Fig. 4. Except for the last interval corresponding to \(h=b\), the other segments, indexed by \(h \in [b-1]_+\), are further divided into h segments, indexed by \(j \in [h]_+\) and this index refers to one of the alternating flat and linear regions within the \(h^{th}\) interval. This results in \(k=\frac{b(b-1)}{2}\) alternating flat and linear regions. Instead, in the segment indexed by \(h=b\), there is only one linear region i.e., \(f^{\star }(0) = \delta\). The optimum distribution is specified in the following theorem:

Fig. 4
Fig. 4
Full size image

The variation of \(f^{\star }(0)\) as a function of \(\delta\) for BD neighborhood showing b segments with the alternate flat and linear regions

Theorem 3

Let \(b+r<\bar{\mu }\). For a given \(\epsilon >0\) and for \(\mu _{{X\!X'}} \in [\bar{\mu }]_+\), \(\,\forall X \in \mathcal {X},\forall X'\in \mathcal {X}_{X}^{(1)}\) (i.e. the BD neighborhood), \(f^{\star }(0)\) versus \(\delta\) features flat and linear regions as shown in Fig. 4. In the first \((b-1)\) segments, indexed by \(h\in [b-1]_+\), each alternating a pair of flat and linear regions indexed \(j \in [h]_+\), with respective boundaries \(\underline{\delta }^{\epsilon }_{h,j}\le \delta \le \overline{\delta }^{\epsilon }_{h,j}\) and \(\overline{\delta }^{\epsilon }_{h,j-1} < \delta \le \underline{\delta }^{\epsilon }_{h,j}\), with the convention \(\overline{\delta }^{\epsilon }_{h,0}=\overline{\delta }^{\epsilon }_{h-1,h-1}\). The following facts are true:

(a) In the kth flat region, \(k=\sum _{j'=1}^{h-1} j'+j=\frac{(h-1)h}{2}+j\), the optimum PMF (c.f. Fig. 3a) for \(i\in [b+1]_+\) is:

$$\begin{aligned} f^\star (\eta )= f^\star (0)e^{-i\epsilon }\equiv \phi _i^k,\, \underline{\eta }_{i-1}< \eta \le \underline{\eta }_{i},\underline{\eta }_{0}=0, \end{aligned}$$
(30)

where what distinguishes the distributions for each k are the intervals \(\underline{\eta }_{i-1}< \eta \le \underline{\eta }_{i},i \in [b+1]_+\) with equal probability mass \(\phi ^k_i\). More specifically, considering the kth flat region, corresponding to the pair hj with \(h\in [b-1]_+\), \(j\in [h]_+\), the intervals \(\underline{\eta }_{i-1}< \eta \le \underline{\eta }_{i}\) of the optimum PMF, for \(i\in [b+1]_+\), have lengths \(\ell _i=\underline{\eta }_{i}-\underline{\eta }_{i-1}\):

$$\begin{aligned} \ell _i&\!=\! {\left\{ \begin{array}{ll} 1,&{} \!\!\!\!\hbox {for}\, i=0\\ \bar{\mu },&{} \!\!\!\!\hbox {for} \,i \in [b\!-\!h]_+ \\ \bar{\mu }\!-\!1,&{} \!\!\!\!\hbox {for}\,i \in [b\!-\!h\!+\!1\!:\!b], \\ {} &{}\,\,\,i \! \ne \!b\!-\!h\!+\!j\!+\!1\\ \bar{\mu },&{} \!\!\!\!\hbox {for} \,i \!=\!b\!-\!h\!+\!j\!+\!1 \hbox { when } \!j\! \ne \!h \\ r+h-u_{hj},&{} \!\!\!\!\hbox {for}\,i=b+1. \end{array}\right. }\end{aligned}$$
(31)
$$\begin{aligned} u_{hj}&:= {\left\{ \begin{array}{ll} 1, \,&{} \hbox {for}\, j \ne h,\\ 0, \,&{} \hbox {for}\, j=h. \end{array}\right. } \end{aligned}$$
(32)

The corresponding indexes sets are obtained as:

$$\begin{aligned} \underline{\eta }_{0}=0, \,\,\underline{\eta }_{i}=\underline{\eta }_{i-1}+\ell _{i},\,\,i\in [b+1]_+. \end{aligned}$$
(33)

To normalize the distribution \(f^\star (0)= \phi ^k_0\) must be:

$$\begin{aligned} \phi ^k_0&=\frac{1}{\sum _{i=0}^{b+1}\ell _i e^{-i\epsilon }} =\left( \bar{\mu }\alpha _{hj}^{\epsilon } \!+\!\beta _{hj}^{\epsilon }\right) ^{-1}, \end{aligned}$$
(34)
$$\begin{aligned} \alpha ^{\epsilon }_{hj}&:=\! e^{-\epsilon } \xi _{b}^{\epsilon } \!-\! (1\!-\!u_{hj})e^{-\!(b-h+j+1)\epsilon }, \end{aligned}$$
(35)
$$\begin{aligned}&\,\,\,\,\,\,\text {where }\xi _{a}^{\epsilon } := \sum _{i'=0}^{a-1} e^{-i'\epsilon }, \forall a\in \mathbb {N}^+ \nonumber \\ \beta _{hj}^{\epsilon }&:=1+ e^{-(b-h+1)\epsilon }(e^{-j\epsilon } - \xi _{h}^{\epsilon }) \nonumber \\&\,\,\,\,\,\,+(r+h-u_{hj}) e^{-(b+1)\epsilon } \end{aligned}$$
(36)

and the PMF is valid within \(\underline{\delta }^{\epsilon }_{h,j}\le \delta \le \overline{\delta }^{\epsilon }_{h,j}\) where:

$$\begin{aligned} \overline{\delta }^{\epsilon }_{h,j}&= \phi _{0}^{k} e^{-(b-h)\epsilon } \sum \limits _{j'=0}^{j} e^{-j'\epsilon }\nonumber \\&=\phi _{0}^{k} e^{-(b-h)\epsilon }\xi ^{\epsilon }_{j+1},j\in [h-1] \end{aligned}$$
(37a)
$$\begin{aligned} \overline{\delta }^{\epsilon }_{h,h}&=\phi _{0}^{k} e^{-(b-h-1)\epsilon }, \,\overline{\delta }^{\epsilon }_{0,0}=\phi _{0} e^{-(b-1)\epsilon }, \end{aligned}$$
(37b)
$$\begin{aligned} \underline{\delta }^{\epsilon }_{h,j}&= \phi _{0}^{k}e^{-(b-h)\epsilon } \xi _{j}^{\epsilon } \equiv \left( \frac{\phi _{0}^{k}}{\phi _{0}^{k-1}}\right) \overline{\delta }^{\epsilon }_{h,j-1} \end{aligned}$$
(37c)

(b) In each of the linear regions, i.e., \(\overline{\delta }^{\epsilon }_{h,j-1} < \delta \le \underline{\delta }^{\epsilon }_{h,j}\), the PMF values vary linearly in groups with respect to \(\delta\). The group lengths are:

$$\begin{aligned} \ell _i&= {\left\{ \begin{array}{ll} 1,&{} \hbox {for}\,i=0\\ \bar{\mu },&{} \hbox {for}\,i \in [b-h]_+\\ \bar{\mu }-1,&{} \hbox {for}\,i \in [b-h+1:b], \\ &{}\,\,\,\,\,i \! \ne b-h+j+1\\ 1,&{} \hbox {for}\,i=b-h+j+1\\ r+h-1,&{} \hbox {for}\,i=b+2\\ \end{array}\right. } \end{aligned}$$
(38)

each with probability \(\psi _i^k(\delta )\) (as shown in Fig. 3b):

$$\begin{aligned}&\psi _i^k(\delta ) = {\left\{ \begin{array}{ll} {\delta e^{(b-h-i) \epsilon }}/{\xi _{j}^{\epsilon }}, &{}\!\!\!\!i \in [b-h+j] \\ {\delta e^{(b-h-i+1) \epsilon }}/{\xi _{j}^{\epsilon }}, &{}\!\!\!\!i \!\in \! [b\!-\!h\!+\!j\!+\!2:\!b+2], \end{array}\right. }\nonumber \\&\psi _i^{b-h+j+1}(\delta ) \!= 1 -\!\!\!\!\!\! \sum \limits _{\begin{array}{c} i=0\\ i \ne b-h+j+1 \end{array}}^{b+2}\!\!\! \ell _i \psi _i^k(\delta ). \end{aligned}$$
(39)

(c) In the \(b^{th}\) segment, i.e. \(\overline{\delta }^{\epsilon }_{b-1,b-1} < \delta \le 1\), the objective function maximum \(f^\star (0)=\delta\). So any set of \(f^\star (\eta ), \,\eta \in [n]\), that satisfy \(\sum \limits _{\eta =1}^{n} f^\star (\eta ) = 1 - \delta\) and \(u_{{X\!X'}} (0) = 1, u_{{X\!X'}} (\eta ) = 0, \,\eta \in [n]_+\), provides the optimal PMF. One of the possible solutions is:

$$\begin{aligned} f^\star (0)&= \delta , \end{aligned}$$
(40)
$$\begin{aligned} f^\star (\eta )&= \frac{1-\delta }{n}, \,\,\eta \in [n]_+. \end{aligned}$$
(41)

Note that, in the last segment, the optimal PMF no longer follows the staircase pattern.

Proof

The proof is in Appendix 10. \(\square\)

In Theorem 3 we cover the case \(b+r<\bar{\mu }\). When \(b+r \ge \bar{\mu }\) the result is not conceptually difficult, but the optimal PMF is hard to express in a readable form. We discuss the general case towards the end of Appendix 10.

Corollary 2

For a given \(\epsilon >0\), the privacy loss for the BD neighborhood case with the optimal noise mechanism is also a discontinuous function of \(\epsilon\), where:

$$\begin{aligned} \delta ^\epsilon = \xi _{j}^{\epsilon } e^{-(b-h)\epsilon }(1-\rho ^{\star }),\,\,\,\epsilon ^{\rho ^{\star }}_{h,j} \le \epsilon <\epsilon ^{\rho ^{\star }}_{h,j-1} \end{aligned}$$
(42)

when \(\rho ^\star = 1 - \psi _0^k(\delta )\) is in kth section, \(k \in \left[ \frac{b(b-1)}{2}\right] _+\) and \(\epsilon _{h,j}^{\rho ^{\star }}\), \(h \in [b-1]_+,\,j \in [h]_+\), are the solutions of:

$$\begin{aligned} \epsilon _{h,j}^{\rho ^{\star }}:\,\,\rho ^{\star }=1-\phi _0^k=1-\left( \bar{\mu }\alpha _b^{\epsilon ^{\rho ^{\star }}_{h,j}} \!+\!\beta _{hj}^{\epsilon ^{\rho ^{\star }}_{h,j}}\right) ^{-1}. \end{aligned}$$
(43)

Proof

The proof is a direct extension of that for Corollary 1 and is omitted for brevity. \(\square\)

Optimal error rates as \(n \rightarrow \infty\)

In this section, we study the limit for \(n\rightarrow \infty\) of the distributions for the two cases we studied, the SD and BD neighborhoods. First, we discuss the \(\delta =0\) case. The goal is to find the relationship between \(\epsilon\) and \(\rho\) when n \(\rightarrow \infty\). From Lemma 3 (17b), we see that for SD neighborhood case, \(f^{\star }(0) \equiv 1 - \rho ^{\star }(\epsilon ) \rightarrow 1 - e^{-\epsilon } \implies \rho ^{\star }(\epsilon ) \rightarrow e^{-\epsilon }\) as \(n \rightarrow \infty\). Since, \(\rho ^{\star }(\epsilon )\) is also constrained by 0.5, we have that the limit function \(\rho ^{\star }_{\infty }(\epsilon )\):

$$\begin{aligned} \rho ^{\star }_{\infty }(\epsilon )&= {\left\{ \begin{array}{ll} 0.5, \,\epsilon \in (0, \ln {2}] \\ e^{-\epsilon }, \,\epsilon \ge \ln {2}. \end{array}\right. } \end{aligned}$$
(44)

And the optimum PMF is zero for all \(\eta \ne h\hat{\mu }\), and:

$$\begin{aligned} f_{\infty }^\star (h\hat{\mu })=(1-\rho ^{\star }_{\infty }(\epsilon ))e^{-h\epsilon }, \,\,h\in \mathbb {N} \end{aligned}$$
(45)

Similarly, for the BD neighborhood case and \(\delta =0\) in Lemma 4 as \(n \rightarrow \infty \implies b \rightarrow \infty\), from (29) we get:

$$\begin{aligned} \phi _0 \equiv 1 - \rho ^{\star }(\epsilon )&\rightarrow \left( 1+\frac{\bar{\mu }e^{-\epsilon }}{ 1-e^{-\epsilon }}\right) ^{-1} \end{aligned}$$
(46)
$$\begin{aligned} \implies \rho ^{\star }(\epsilon )&\rightarrow \frac{\bar{\mu }e^{-\epsilon }}{\bar{\mu }e^{-\epsilon } + (1-e^{-\epsilon })} \end{aligned}$$
(47)

Hence, the expression for \(\rho ^{\star }(\epsilon )\) for any \(\epsilon >0\) is:

$$\begin{aligned} \rho ^{\star }_{\infty }(\epsilon )&= {\left\{ \begin{array}{ll} 0.5, &{}\,\epsilon \in (0, \ln {(1+\bar{\mu })}] \\ \frac{\bar{\mu }e^{-\epsilon }}{\bar{\mu }e^{-\epsilon } + (1-e^{-\epsilon })}, &{}\,\epsilon \ge \ln {(1+\bar{\mu })}. \end{array}\right. } \end{aligned}$$
(48)

Each of the PMF staircase steps becomes of size \(\bar{\mu }\) and the values have an exponentially decaying trend:

$$\begin{aligned} f_{\infty }^\star (0)&=1-\rho ^\star _{\infty }(\epsilon ) \end{aligned}$$
(49)
$$\begin{aligned} f^{\star }_{\infty }(\eta )&=f_{\infty }^\star (0)e^{-h\epsilon },\,(h\!-\!1)\bar{\mu }<\eta \le h\bar{\mu }, \,h\in \mathbb {N}^+. \end{aligned}$$
(50)

For \(0<\delta \le 1\), it is convenient to use the index \(i=n-k\) looking at the trend of the distortion from \(\delta =1\), where \(f^{\star }(0)=1\) backward. Because the discontinuities between flat and linear regions happen at the points where \(\delta =\underline{\delta }^{\epsilon }_{n-i},\,i\in [n-1]\) we can see from Theorem 2 the distortion for \(i\in [n-1]\):

$$\begin{aligned} f_i^\star (0)\le \frac{1\!-\!e^{-\epsilon }}{1\!-\!e^{-(i+1)\epsilon }}\Rightarrow \rho ^\star (\delta ,\epsilon )\ge 1- \frac{1\!-\!e^{-\epsilon }}{1\!-\!e^{-(i+1)\epsilon }}, \end{aligned}$$
(51)

and the size of the intervals shrinks like an \(o(e^{-i\epsilon })\), as \(i\rightarrow +\infty\) quickly leading to the same result as \(\delta \rightarrow 0\), where the distortion tends to \(e^{-\epsilon }\) as stated before.

Similarly, for the BD neighborhood case, to find the expressions for \(\phi _0^{\infty }\) and \(\phi _i^{\infty }\), it is convenient to use a new index \(c=b-h\), looking at the trends of the distortion from \(\delta =1\), where \(f^{\star }(0)=1\), going backwards towards \(\delta =0\). In the \(b^{th}\) segment (part (c) of Theorem 3, \(f_b^\star (\eta ) \rightarrow 0\) as \(b \rightarrow \infty\) and \(n \rightarrow \infty\) for \(\eta \in [n]_+\) and thus \(f_b^\star (0) \rightarrow 1\). For \(\delta \approx 1\), in the \(c^{th}\) region we get the following expressions by using (34):

$$\begin{aligned} \alpha ^{\epsilon }_{b-c,j}&\!\rightarrow \! \frac{e^{-\epsilon }}{1-e^{-\epsilon }}; \,\,\beta ^{\epsilon }_{b-c,j} \!\rightarrow \! 1 - \frac{e^{-(c+j+1)\epsilon }}{1-e^{-\epsilon }} \end{aligned}$$
(52)
$$\begin{aligned} \implies \!\! \phi _0^{\infty }(c,j)&\rightarrow \frac{1-e^{-\epsilon }}{\bar{\mu }e^{-\epsilon } \!+\! (1\!-\!e^{-\epsilon }(1\!+\!e^{-(c+j)\epsilon }))}, \end{aligned}$$
(53)
$$\begin{aligned} \phi _i^{\infty }&\rightarrow e^{-i\epsilon } \phi _0^{\infty }, \,\,i \in [b+1]_+. \end{aligned}$$
(54)

Now, as \(c \rightarrow \infty\), the expression of \(\phi _0^{\infty }(c,j)\) in (53) converges to \(\phi _0\) as shown in (46), i.e. the result for \(\delta \rightarrow 0\).

Optimal noise mechanism for vector queries

Next, we briefly discuss the optimal noise mechanism design for vector queries to explore what difference it makes to optimize after mapping each vector onto a number in [n] vs. adopting the mechanism on an entry-by-entry basis. In fact, let each entry of a vector query be in the set \(\mathcal {Q}\). In the first case, the MILP formulation of the problem defined in (13a)–(13g) can be applied directly to vectors of queries considering the masses of probabilities as a joint PMF, with arguments corresponding to all possible tuples in \(\mathcal {Q}^k\). In Section “Numerical results”, we provide two examples for 2D vector queries—one for the BD neighborhood and another for an arbitrary neighborhood (see Fig. 13). We observe that for the BD neighborhood case, the optimum noise mechanism follows a staircase pattern in 2D as well and for the arbitrary neighborhood, the optimum noise mechanism has \(e^{-\epsilon } f^{\star }(\varvec{0})\) values at \(\varvec{\eta }= \varvec{\mu }_{{X\!X'}}\), where boldface letters are vectors.

Remark 4

In general, the optimal multidimensional noise mechanism does not amount to adding independent random noise to each query entry. In Section “Numerical results” we corroborate this statement by showing a counter-example obtained using the MILP program for the vector case, considering a two-dimensional vector query.

Numerical results

Fig. 5
Fig. 5
Full size image

Comparison of the proposed optimal mechanism in terms of the expected distortion costs with those proposed in Ghosh et al. (2009), Canonne et al. (2020), Sadeghi et al. (2020), McSherry and Talwar (2007), Ravi et al. (2022). In plot (a), the optimal noise mechanism is compared in terms of MSE, \(\rho ^{MSE}\), with the discrete geometric mechanism, discrete Gaussian mechanism, and Gumbel mechanism for a fixed a value of \(\delta =0.3\) and \(n=8\). In plot (b), the optimal noise mechanism is compared in terms of error rate, \(\rho ^{ER}\), with the discrete geometric mechanism, discrete count mechanism, and data independent mechanism for a fixed a value of \(\delta =0.5\) and \(n=7\)

First, we compare the performance of our proposed optimal noise mechanism with the discrete geometric mechanism (Ghosh et al. 2009), discrete Gaussian mechanism (Canonne et al. 2020), classical exponential mechanism (McSherry and Talwar 2007), discrete count mechanism (Sadeghi et al. 2020), and data independent mechanism (Ravi et al. 2022). In plot 5a, the performance is compared in terms of \(\rho ^{MSE}\) vs. \(\epsilon\) for a fixed value of \(\delta =0.3\). Similarly, in plot 5b, the performance is compared in terms of \(\rho ^{ER}\) vs. \(\epsilon\) for a fixed value of \(\delta =0.5\). From the plots, it is clear that the proposed optimal noise mechanism significantly outperforms all these mechanisms.

Popular mechanisms for discrete queries are adding a random variable from a geometric distribution (Ghosh et al. 2009; Balcer and Vadhan 2019) or a quantized Gaussian distribution (see e.g. Canonne et al. 2020). Clamping is an operation in which the query response \(\tilde{q}\) is projected onto the domain [n] for any \(\eta \in \mathbb {Z}\) (Ghosh et al. 2009), i.e.:

$$\begin{aligned} \tilde{q} = \min (\max (0, q+\eta ),n). \end{aligned}$$
(55)

Let \(F_{\eta }(\eta )\) denote the cumulative distribution function of \(\eta\); then the distribution of \(\tilde{{q}}\) in terms of the distribution of \(\eta\) after clamping is as follows:

$$\begin{aligned} f_{\tilde{q}}(k|q) =&{\left\{ \begin{array}{ll} F_{\eta }(-q), &{} \,\,\,\,\,\,k = 0\\ f_{\eta }(k-q), &{}\,\,\,\,\,\,k \in [n-1]_+\\ 1-F_{\eta }(n-q) &{} \,\,\,\,\,\,k = n. \end{array}\right. } \end{aligned}$$
(56)

From (56) one can compute the \((\epsilon , \delta )\) privacy curve using (2), (3) and (56). After clamping the \((\epsilon ,\delta )\) guarantees provided by the said DP mechanisms are different from the ones calculated for \(n=+\infty\) as shown in Fig. 6, reported for the same MSE = 3.38. From the figure, it is clear that clamping increases \(\delta\) for the same \(\epsilon\) budget, and this effect is particularly pronounced in the case of the Gaussian mechanism. Hence, it is not advisable to use infinite support-based noise mechanisms, such as discrete geometric and discrete Gaussian, in tandem with clamping operations to publish the discrete query response with finite support.Footnote 5

Fig. 6
Fig. 6
Full size image

This plot shows the adverse effect of clamping operation on the \((\epsilon ,\delta )\) trade-off for discrete geometric and discrete Gaussian mechanisms. The following parameters are used: \(n=8, \alpha = 0.7\) and \(\sigma ^2 = 3.38\)

Next, we now test the modified MILP problem (15a) of minimizing \(\delta\), constraining the error rate \(\rho = \rho ^{ER}\) and solve the MILP numerically using Gurobi as the solver. For a fair comparison, we consider the measure of errors vs. the ER and MSE, and respective parameters of the noise mechanisms viz., \(\alpha\) in Ghosh et al. (2009), \(\sigma ^2\) in Canonne et al. (2020), \(\beta '\) in Adams (2013)Footnote 6, and \(\rho\) in Sadeghi et al. (2020), Ravi et al. (2022); the results are shown in Fig. 7. In plot 7a, we show the comparison of the proposed optimal noise mechanism with the discrete geometric, discrete Gaussian, and Gumbel mechanisms for a given MSE = 0.6101. Similarly, in plot 7b, we show the comparison of the proposed optimal noise mechanism with the discrete Gaussian mechanism, discrete count mechanism, and data independent mechanisms for a given ER = 0.3. From the plots, it is clear that the proposed optimal noise mechanism significantly outperforms all these mechanisms \(\forall \epsilon > 0\).

Fig. 7
Fig. 7
Full size image

Comparison of the proposed optimal mechanism in terms of \((\epsilon , \delta )\) trade-offs with those proposed in Ghosh et al. (2009), Canonne et al. (2020), Sadeghi et al. (2020), McSherry and Talwar (2007). In plot (a), the optimal noise mechanism is compared with the discrete geometric mechanism, discrete Gaussian mechanism and Gumbel mechanism for a fixed MSE, i.e., \(\rho ^{MSE}_{Geo.} = \rho ^{MSE}_{Gau.} = \rho ^{MSE}_{Gum.} = \rho ^{MSE}_{Opt.} = 0.6101\) and \(n=8\). In plot (b), the optimal noise mechanism is compared with the discrete geometric mechanism, discrete count mechanism, and data independent mechanism, i.e. \(f(0) = 1-\rho ^{ER}_{Ind.}\), \(f(\eta ) = \rho ^{ER}_{Ind.}/n,\,\eta \in [n]_+\), for a fixed ER, i.e., \(\rho ^{ER}_{Geo.} = \rho ^{ER}_{Cnt.} = \rho ^{ER}_{Opt.} = \rho ^{ER}_{Ind.} = 0.3\) and \(n=7\)

Fig. 8
Fig. 8
Full size image

This plot shows the PMF of the optimal noise mechanism compared with both the Gaussian and geometric mechanisms. The following parameter values are used: \(n=8, \alpha = 0.7, \epsilon =1\), \(\delta = 0\), and \(\sigma ^2 = 3.38\)

Fig. 9
Fig. 9
Full size image

These plots show the PMF of the optimal noise mechanisms for the SD neighborhood case and the following parameter values are used for both \(n=7, \epsilon = 0.75\). \(\hat{\mu } = 3\) is used in plot (a), whereas \(\hat{\mu } = 2\) in used in plot (b)

Now, we provide the comparison between the PMFs of optimal noise distribution with regard to Gaussian and geometric distributions in Fig. 8 for the same MSE parameter for all the distributions. From the plot, we can observe that the probability mass at \(\eta =0\) is maximum for the proposed mechanism, which validates our claim of the least error rate among these mechanisms.

In the following figures, we show the structure of the PMF associated with the optimal noise mechanism. First, we consider the SD neighborhood case. More specifically, the plots in Fig. 9 show the PMF of the optimal noise mechanisms, \(f^\star {(\eta )}, \eta \in [n]\), for SD neighborhood case for \(\hat{\mu } = 3\) in Fig. 9a and for \(\hat{\mu } = 2\) in Fig. 9b. In the left plot, we see that \(f^\star {(\eta )}\) is non-zero for all \(\eta \in [n]\) since (\(n+1, \hat{\mu }\)) are relatively prime but in the right plot, we see that \(f^\star {(\eta )}\) is zero for \(\eta \in \{1, 3, 5, 7\}\), since (\(n+1, \hat{\mu }\)) are not relatively prime (see Fig. 1 for the reasoning). From the plots we observe that as \(\delta\) increases, \(f^\star {(0)}\) increases and since error rate, \(\rho ^{ER}\), is \(1 - f^\star {(0)}\) it decreases with increase in \(\delta\). And in the right plot, at some of the \(\eta \in [n]_+\) values, zero probability mass is assigned. Hence, we see a higher value of \(f^\star {(0)}\) compared to the corresponding values in the left plot, thus having a lower error rate in the right plot for a given \(\delta\) value. Recall Lemma 2 and Lemma 3, to see that \(f^\star {(0)} \ge f^\star {(3)} \ge f^\star {(6)} \ge f^\star {(1)} \ge f^\star {(4)} \ge f^\star {(7)} \ge f^\star {(2)} \ge f^\star {(5)}\) in Fig. 9a which are represented using \(f^\star _{(i)}, i \in [n]\), respectively. Similarly, we observe \(f^\star {(0)} \ge f^\star {(2)} \ge f^\star {(4)} \ge f^\star {(6)}\) and rest \(f^\star {(1)} = f^\star {(3)} = f^\star {(5)} = f^\star {(7)} = 0\) in Fig. 9b. Next, we consider the BD neighborhood case. The plots in Fig. 10 show the PMF of the optimal noise mechanisms, \(f^\star {(\eta )}, \eta \in [n]\), for BD neighborhood case for \(\delta = 0\) in Fig. 10a and for \(\delta > 0\) in Fig. 10b. In the left plot, we clearly see the staircase pattern with step sizes equal to \(\bar{\mu }\), except for the last step. In the right plot, we see that step widths change as \(\delta\) increases its values while the staircase structure is maintained. Note that the vertical height of each step is \(e^{\epsilon }\) times higher than the previous one, as can be seen in Fig. 10b and in Table 1. Also, from the plot in Fig. 10b and Table 1, one can observe as the value of \(f^\star {(0)}\) increases (thus the value of \(\rho ^{ER}\) decreases) as \(\delta\) increases, as it is expected.

Fig. 10
Fig. 10
Full size image

These plots show the PMF of the optimal noise mechanisms for BD neighborhood case and the following parameter values are used: \(n=8, \bar{\mu } = 3, \epsilon = 1.5\). In plot (a) we see that the optimal noise mechanism follows staircase pattern starting from \(\eta = 1\) with \(b=2\) steps of length \(\bar{\mu } = 3\) and one last step of length \(r=2\). In plot (b) we show how staircase pattern and step lengths change with \(\delta\). It can be seen at \(\delta = \overline{\delta }^{\epsilon }_{0} = 0.1212\), \(\delta = \underline{\delta }^{\epsilon }_{1} = 0.1238\), and \(\delta = \underline{\delta }^{\epsilon }_{2} = 0.1522\) step lengths are: (1,3,3,2) (blue coloured bars), (1,3,2,3) (red coloured bars), and (1,3,2,2,1) (yellow coloured bars), respectively

Table 1 The PMF of the optimal noise mechanism for different values of \(\delta\) for \(n=8, \bar{\mu } = 3, \epsilon = 1.5\)

Next, we consider the vector query case and provide a counter example to support Remark 4 for a two-dimensional vector query. Let \(n=4, \epsilon _1 =1.5, \epsilon _2 =1.5, \epsilon = 3, \mu _{XX'} = \{0,1,2\}, \delta =0\). The optimal noise mechanism for \((\epsilon _1,0)\) and \((\epsilon _2,0)\) DP are: \(f^{\star }_1 (\eta )\) = \(f^{\star }_2 (\eta )\) = [0.6469, 0.1443, 0.1443, 0.0322, 0.0322]; the values of the optimal noise mechanism PMF \(f^{\star } (\eta _1, \eta _2)\) for \((\epsilon ,0)\) are in (57):

$$\begin{aligned} \begin{bmatrix} 0.6954 &{} 0.0346 &{} 0.0346 &{} 0.0017 &{} 0.0017\\ 0.0346 &{} 0.0346 &{} 0.0346 &{} 0.0017 &{} 0.0017 \\ 0.0346 &{} 0.0346 &{} 0.0346 &{} 0.0017 &{} 0.0017 \\ 0.0017 &{} 0.0017 &{} 0.0017 &{} 0.0017 &{} 0.0017 \\ 0.0017 &{} 0.0017 &{} 0.0017 &{} 0.0017 &{} 0.0017 \\ \end{bmatrix} \end{aligned}$$
(57)

The marginal distributions happen to be equal, which makes sense in terms of symmetry: \(f_1 (\eta _1) = \sum _{\eta _2=0}^n f^{\star } (\eta _1, \eta _2)\equiv f_2 (\eta _2)\) and they have masses [0.7681, 0.1073, 0.1073, 0.0086, 0.0086]. We can observe that \(f^{\star } (\eta _1, \eta _2) \ne f_1 (\eta _1)f_2 (\eta _2)\).

The plots in Figs. 11, 12, 13 and 14 are self explanatory.

Fig. 11
Fig. 11
Full size image

These plots show the error rate for the BD neighborhood case v/s parameter \(\bar{\mu }\). In the left plot \(n=9, \,\delta =0.2\), and in the right plot \(n=9, \,\epsilon =2\) are used

Fig. 12
Fig. 12
Full size image

Error rate \(\rho\) for the SD and BD neighborhood cases v/s parameter \(\delta\), respectively, confirming the trends predicted by Theorems 2 and 3. In plot (a) \(n=9, \,\hat{\mu }=3\) and in plot (b) \(n=9, \,\bar{\mu }=3\) are used

Fig. 13
Fig. 13
Full size image

The optimal noise joint PMF for two-dimensional vector query case for a BD and arbitrary neighborhoods, respectively. In plot (a) the parameters are: \(n=4, \,\epsilon =3, \,\bar{\mu }_1=\bar{\mu }_2=2\). In this BD neighborhood example, the staircase pattern is similar to the scalar query case in Theorem 3. In plot (b) the following parameters are used: \(n=6, \,\epsilon =3, \,{\mu }_1= \{1, 3\}, \,{\mu }_2= \{2, 5\}\). In this arbitrary neighborhood example, the second largest peaks can be observed at the union of distance one set of each dimension, i.e., \([1,2], \,[1,5],\,[3,2], \,[3,5]\)

Fig. 14
Fig. 14
Full size image

These plots show the PMF of the optimal noise mechanisms when the Advanced Metering Infrastructure (AMI) database is queried from 1416 houses that belong to 12 distribution circuits across California, USA. We use \(\tilde{\mu } = \bigcup _{\forall X \in \mathcal {X}} \bigcup _{\forall X'\in \mathcal {X}_{X}^{(1)}} \mu _{{X\!X'}}\). In the left plot, 11 quantization levels are used, hence \(n=10\). In this example, \(\tilde{\mu } = \{1, 2, 3, 4\}\) is observed. In the right plot, 17 quantization levels are used, hence \(n=16\). In this example, \(\tilde{\mu } = \{1, 3, 5,6\}\) is observed. From these figures we observe that \(f^\star (\eta )\) is considerably larger for \(\eta \in \{\tilde{\mu } \cup 0\}\) than those \(\eta \in [n] {\setminus } \{\tilde{\mu } \cup 0\}\)

Conclusions and future work

Considering queries whose domain is discrete and finite, in this paper we proposed a novel MILP formulation to determine what is the PMF for an additive noise mechanism that minimizes the error rate of the DP answer for any (\(\epsilon ,\,\delta\)) pair. The modulo addition between the noise and the queried data is modulo \(n+1\) equal to the size of the query domain. For two special cases, which we referred to as the SD neighborhood and bounded difference (BD) neighborhood, we have provided closed-form solutions for the optimal noise PMF and its probability of error versus \(\delta\) for a given \(\epsilon\) and studied the asymptotic case for \(n\rightarrow \infty\). We also compared the proposed optimal noise mechanism to state-of-the-art noise mechanisms and found that it significantly outperforms them for a given ER or MSE. In the future, we plan to leverage these results in several applications that have to do with labeling data as well as a building block to study theoretically queries with finite uncountable support as well as the case of vector queries, whose optimum PMF can be calculated with our MILP and does not appear to be the product of the optimum PMFs for each entry.