1 Introduction

Telegram is a chat platform that in January 2021 reportedly had 500 M monthly users [60]. It provides a host of multimedia and chat features, such as one-on-one chats, public and private group chats for up to 200,000 users as well as public channels with an unlimited number of subscribers. Prior works establish the popularity of Telegram with higher-risk users such as activists [25] and participants of protests [1]. In particular, it is reported in [1, 25] that these groups of users shun Signal in favour of Telegram, partly due to the absence of some key features, but mostly due to Signal’s reliance on phone numbers as contact handles.

This heavy usage contrasts with the scant attention paid to Telegram’s bespoke cryptographic design—MTProto—by the cryptographic community. To date, only four works treat Telegram. In [34], an attack against the IND-CCA security of MTProto 1.0 was reported, in response to which the protocol was updated. In [54], a replay attack based on improper validation in the Android client was reported. Similarly, [39] reports input validation bugs in Telegram’s Windows Phone client. Recently, in [46] MTProto 2.0 (the current version) was proven secure in a symbolic model, but assuming ideal building blocks and abstracting away all implementation/primitive details. In short, the security that Telegram offers is not well understood.

Telegram uses its MTProto “record layer”—offering protection based on symmetric cryptographic techniques—for two different types of chats. By default, messages are encrypted and authenticated between a client and a server, but not end-to-end encrypted: such chats are referred to as cloud chats. Here Telegram’s MTProto protocol plays the same role that TLS plays in, for example, Facebook Messenger. In addition, Telegram offers optional end-to-end encryption for one-on-one chats which are referred to as secret chats (these are tunnelled over cloud chats). So far, the focus in the cryptographic literature has been on secret chats [34, 39] as opposed to cloud chats. In contrast, in [1] it is established that the one-on-one chats played only a minor role for the protest participants interviewed in the study; significant activity was reportedly coordinated using group chats secured by the MTProto protocol between Telegram clients and the Telegram servers. For this reason, we focus here on cloud chats. Given the similarities between the cryptography used in secret and cloud chats, our positive results can be modified to apply to the case of secret chats (but we omit any detailed analysis).

For completeness, note that follow-up work [3] has in the meantime also studied the MTProto “handshake” in a suitable multi-stage key exchange model.

1.1 Contributions

We provide an in-depth study of how Telegram uses symmetric cryptography inside MTProto for cloud chats. We give four distinctive contributions: our security model for secure channels, the formal specification of our variant of MTProto, our attacks on the original protocol and our security proofs for the formal specification of MTProto.

1.1.1 Security model

Starting from the observation that MTProto entangles the keys of the two channel directions, in Sect. 3 we develop a bidirectional security model for two-party secure channels that allows an adversary full control over generating and delivering ciphertexts from/to either party (client or server). The model assumes that the two parties start with a shared key and use stateful algorithms. Our security definitions come in two flavours, one capturing confidentiality and the other integrity. We also consider a combined security notion and its relationship to the individual notions. Our formalisation is broad enough to consider a variety of different styles of secure channels—for example, allowing channels where messages can be delivered out of order within some bounds, or where messages can be dropped (neither of which we consider appropriate for secure messaging). This caters for situations where the secure channel operates over an unreliable transport protocol, but where the channel is designed to recover from accidental errors in message delivery as well as from certain permitted adversarial behaviours.

This is done technically by introducing the concept of support functions, inspired by the support predicates recently introduced by [28] but extending them to cater for a wider range of situations. Here the core idea is that a support function operates on the transcript of messages and ciphertexts sent and received (in both directions) and its output is used to decide whether an adversarial behaviour—say, reordering or dropping messages—counts as a “win” in the security games. It is also used to define a suitable correctness notion with respect to expected behaviours of the channel.

As a final feature, our secure channel definitions allow the adversary complete control over all randomness used by the two parties, since we can achieve security against such a strong adversary in the stateful setting. This decision reflects a concern about Telegram clients expressed by Telegram developers [61].

Fig. 1
Fig. 1

Illustration of the symmetric cryptography used in MTProto 2.0. The input \( ak \) denotes the shared secret key resulting from the key exchange, while the input \(p\) represents the (encoded) plaintext payload

1.1.2 Formal specification of MTProto

In Sect. 4, we provide a detailed formal specification of Telegram’s symmetric encryption; Fig. 1 illustrates its main components. Our specification is computational and does not abstract away the building blocks used in Telegram. This in itself is a non-trivial task as no formal specification exists and behaviour can only be derived from official (but incomplete) documentation and from dynamic analysis of Telegram’s implementation; moreover, different clients do not have the same behaviour.

Formally, we define an MTProto-based bidirectional channel \(\textsf{MTP}\text {-}\textsf{CH} \) as a composition of multiple cryptographic primitives. This allows us to recover a variant of the real-world MTProto protocol by instantiating the primitives with specific constructions and to study whether each of them satisfies the security notions that are required in order to achieve the desired security of \(\textsf{MTP}\text {-}\textsf{CH} \). This allows us to significantly simplify the analysis. However, we emphasise that our goal is to be descriptive, not prescriptive, i.e. we do not suggest alternative instantiations of \(\textsf{MTP}\text {-}\textsf{CH} \).

To arrive at our specification, we had to make several decisions on what behaviour to model and where to draw the line of abstraction. Notably, there are various behaviours exhibited by (official) Telegram implementations that lead to attacks.

In particular, we verified in practice that current implementations allow an attacker on the network to reorder messages from a client to the server, with the transcript on the client being updated later to reflect the attacker-altered server’s view. We stress, though, that this trivial yet practical attack is not inherent in MTProto and can be avoided by updating the processing of message metadata in Telegram’s servers. The consequences of such an attack can be quite severe, as we discuss further in Sect. 4.2.

Further, if a message is not acknowledged within a certain time in MTProto, it is resent using the same metadata and with fresh random padding. While this appears to be a useful feature and a mitigation against message drops, it would actually enable an attack in our formal model if such retransmissions were included in the specification. In particular, an adversary who also has control over the randomness can break stateful IND-CPA security with three encryption queries, while an attacker without that control could do so with about \(2^{64}\) encryption queries. We use these more theoretical attacks to motivate our decision not to allow re-encryption with fixed metadata in our formal specification of MTProto, i.e. we insist that the state is evolving.

1.1.3 Proof of security

We then prove in Sect. 5 that our slight variant of MTProto achieves channel confidentiality and integrity in our model, under certain assumptions on the components used in its construction. As described in Sect. 1.3, Telegram has implemented our proposed alterations so that there can be some assurances about MTProto as currently deployed.Footnote 1

We use code-based game-hopping proofs in which the analysis is modularised into a sequence of small steps that can be individually verified. As well as providing all details of the proofs, we also give high-level intuitions. Significant complexity arises in the proofs from two sources: the entanglement of keys used in the two channel directions, and the detailed nature of the specification of MTProto that we use.

We eschew an asymptotic approach in favour of concrete security analysis. This results in security theorems that quantitatively relate the confidentiality and integrity of MTProto as a secure channel to the security of its underlying cryptographic components. Our main security results, Theorems 1 and 2 and Corollaries 1 and 2, provide confidentiality and integrity bounds containing terms equivalent to \(\approx q/2^{64}\) where \(q\) is the number of \(\textsc {Send}\) queries an attacker makes. We discuss this further in Sect. 5.

However, our security proofs rely on several assumptions about cryptographic primitives that, while plausible, have not been considered in the literature. In more detail, due to the way Telegram makes use of \(\textsf{SHA}-\textsf{256}\) as a MAC algorithm and as a KDF, we have to rely on the novel assumption that the block cipher \(\textsf{SHACAL}-\textsf{2}\) underlying the \(\textsf{SHA}-\textsf{256}\) compression function is a leakage-resilient PRF under related-key attacks, where “leakage-resilient” means that the adversary can choose a part of the key. Our proofs rely on two distinct variants of such an assumption. In Appendix F, we show that these assumptions hold in the ideal cipher model, but further cryptanalysis is needed to validate them for \(\textsf{SHACAL}-\textsf{2}\). For similar reasons, we also require a dual-PRF assumption of \(\textsf{SHACAL}-\textsf{2}\). We stress that such assumptions are likely necessary for our or any other computational security proofs for MTProto. This is due to the specifics of how MTProto uses \(\textsf{SHA}-\textsf{256}\) and how it constructs keys and tags from public inputs and overlapping key bits of a master secret. Given the importance of Telegram, these assumptions provide new, significant cryptanalysis targets as well as motivate further research on related-key attacks.

Besides using \(\textsf{SHA}-\textsf{256}\) as a MAC algorithm and a KDF, MTProto also uses \(\textsf{SHA}-\textsf{1}\) to compute a key identifier. This does not lead to length-extension attacks because in each use case either the input is required to have a fixed length, or the output gets truncated. The latter technique was previously studied as ChopMD [23] and employed to build AMAC [10]. But rather than applying these results to show that the design of the MAC algorithm prevents forgeries, our proofs rely on an observation that even if length-extension attacks were possible, it would still not lead to breaking the security of the overall scheme. This is true because the plaintext encoding format of MTProto mandates the presence of certain metadata in the first block of the encrypted payload.

1.1.4 Attacks

We present further implementation attacks against Telegram in Sections 6 and 7. These attacks highlight the limits of our formal modelling and the fragility of MTProto implementations. The first of these, a timing attack against Telegram’s use of IGE mode encryption, can be avoided by careful implementation, but we found multiple vulnerable clients.Footnote 2 The attack takes inspiration from an attack on SSH [5]. It exploits that Telegram encrypts a length field and checks integrity of plaintexts rather than ciphertexts. If this process is not implemented whilst taking care to avoid a timing side channel, it can be turned into an attack recovering up to 32 bits of plaintext. We give examples from the official Desktop, Android and iOS Telegram clients, each exhibiting a different timing side channel. However, we stress that the conditions of this attack are difficult to meet in practice. In particular, to recover bits from a plaintext message block \(m_{i}\) we assume knowledge of message block \(m_{i-1}\) (we consider this a relatively mild assumption) and, critically, message block \(m_{1}\) which contains two 64-bit random values negotiated between the client and the server. Thus, confidentiality hinges on the secrecy of two random strings—a salt and an id. Notably, these fields were not designated for this purpose in the Telegram documentation.

In order to recover \(m_{1}\) and thereby enable our plaintext-recovery attack, in Section 7 we chain it with another attack on the server-side implementation of Telegram’s key exchange protocol. This attack exploits how Telegram servers process RSA ciphertexts. While the exploited behaviour was confirmed by the Telegram developers, we did not verify it with an experiment.Footnote 3 It uses a combination of lattice reduction and Bleichenbacher-like techniques [19]. This attack actually breaks server authentication—allowing a MiTM attack—assuming the attack can be completed before a session times out. But, more germanely, it also allows us to recover the id field. This essentially reduces the overall security of Telegram to guessing the 64-bit salt field. Details can be found in Section 7. We stress, though, that even if all the assumptions that we make in Section 7 are met, our exploit chain (Sect. 6, Section 7)—while being considerably cheaper than breaking the underlying \(\textsf{AES}-\textsf{256}\) encryption—is far from practical. Yet, it demonstrates the fragility of MTProto, which could be avoided—along with unstudied assumptions—by relying on standard authenticated encryption or, indeed, just using TLS.

We conclude with a broader discussion of Telegram security and with our recommendations in Sect. 8.

1.2 Publication history

This is the full version of the paper published at IEEE S&P 2022 [4]. The proofs referred to in [4, Section V] are contained in full here and can be found in Appendices E and F and in Sect. 5 (in particular Sections 5.5 and 5.6). We have also expanded the content of several other sections as follows: Section 3 defining bidirectional channels, originally [4, Section III], was expanded with more context and illustrating examples. Section 6.1 on the timing attack, originally [4, Section VI], contains the code samples for all affected Telegram clients. Section 7 on the key exchange attack, originally [4, Appendix A], is significantly expanded and contains an overview of the key exchange protocol as well as the attack in detail. This work also contains several new appendices: Appendices A to C expand and help to position our new channels framework, while Appendices D and G give more details about the Telegram protocol and the implementation of our attacks.

1.3 Disclosure

We notified Telegram’s developers about the vulnerabilities we found in MTProto on 16 April 2021. They acknowledged receipt soon after and the behaviours we describe on 8 June 2021. They awarded a bug bounty for the timing side channel and for the overall analysis. We were informed by the Telegram developers that they do not do security or bugfix releases except for immediate post-release crash fixes. The development team also informed us that they did not wish to issue security advisories at the time of patching nor commit to release dates for specific fixes. Therefore, the fixes were rolled out as part of regular Telegram updates. The Telegram developers informed us that as of version 7.8.1 for Android, 7.8.3 for iOS and 2.8.8 for Telegram Desktop all vulnerabilities reported here were addressed. When we write “the current version of MTProto” or “current implementations”, we refer to the versions prior to those version numbers, i.e. the versions we analysed.

2 Preliminaries

2.1 Notational conventions

2.1.1 Basic notation

Let \({{\mathbb {N}}}=\{1, 2, \ldots \}\). For \(i\in {{\mathbb {N}}}\) let [i] be the set \(\{1, \ldots , i\}\). We denote the empty string by \(\varepsilon \), the empty set by \(\emptyset \), and the empty list by \([]\). We let \(x_1\leftarrow x_2 \leftarrow v\) denote assigning the value v to both \(x_1\) and \(x_2\). Let \(x\in \{0,1\}^*\) be any string; then |x| denotes its bit length, x[i] denotes its i-th bit for \(0 \le i < \left| x\right| \), and \(x[a: b]=x[a]\ldots x[b-1]\) for \(0\le a < b \le |x|\). For any \(x\in \{0,1\}^*\) and \(\ell \in {{\mathbb {N}}}\) such that \(|x| \le {\ell }\), we write \(\langle x \rangle _{\ell }\) to denote the bit-string of length \(\ell \) that is built by padding x with leading zeros. For any two strings \(x, y \in \{0,1\}^*\), \(x~\Vert ~y\) denotes their concatenation. If \({X}\) is a finite set, we let denote picking an element of \({X}\) uniformly at random and assigning it to x. If \(\textsf{T}\) is a table, \(\textsf{T}[i]\) denotes the element of the table that is indexed by i. If \(\textsf{tr}\) is a list, then \(\textsf{tr}[i]\) denotes the element of this list that is indexed by i, where the index is 0-based; further, \(\textsf{tr} ~\Vert ~x\) denotes appending the element x to \(\textsf{tr}\). We let \(\bot \not \in \{0,1\}^*\) be an error code that indicates rejection, and we may also use when another distinct error code is needed. Uninitialised integers are assumed to be set to 0, Booleans to \(\texttt {false}\), strings to \(\varepsilon \), sets to \(\emptyset \), and lists to \([]\). Each element of a table is assumed to be initialised to \(\bot \), indicating that it is empty. We use int64 as a shorthand for a 64-bit integer data type. We use 0x to prefix a hexadecimal string in big-endian order. All variables are represented in big-endian unless specified otherwise.

2.1.2 Algorithms and adversaries

Algorithms may be randomised unless otherwise indicated. Running time is worst case. If A is an algorithm, \(y \leftarrow A(x_1,\ldots ;r)\) denotes running A with random coins r on inputs \(x_1,\ldots \) and assigning the output to y. We let be the result of picking r at random and letting \(y \leftarrow A(x_1,\ldots ;r)\). We let \([A(x_1,\ldots )]\) denote the set of all possible outputs of A when invoked with inputs \(x_1,\ldots \). The instruction \({\textbf{abort}}(x_1,\dots )\) is used to immediately halt the algorithm with output \((x_1,\dots )\). Adversaries are algorithms. Besides using \(\bot \) as an error code, we also let oracles explicitly return \(\bot \) if they would have otherwise terminated with no output. We require that adversaries never pass \(\bot \) as input to their oracles. If any of the inputs taken by an adversary A is \(\bot \), then all of its outputs are \(\bot \).

2.1.3 Security games and reductions

We use the code-based game-playing framework of [17]. (See Fig. 3 for an example.) \(\Pr [\textrm{G}]\) denotes the probability that game \(\textrm{G}\) returns \(\texttt {true}\). Variables in each game are shared with its oracles. In the security reductions, we omit specifying the running times of the constructed adversaries when they are roughly the same as the running time of the initial adversary. Let \(\textrm{G}_{\mathcal {D}}\) be any security game defining a decision-based problem that requires an adversary \(\mathcal {D}\) to guess a challenge bit d; let \(d'\) denote the output of \(\mathcal {D}\), and let game \(\textrm{G}_{\mathcal {D}}\) return \(\texttt {true}\) iff \(d' = d\). Depending on the context, we interchangeably use the two equivalent advantage definitions for such games: \(\textsf{Adv}^{\textsf{}}_{}(\mathcal {D}) = 2 \cdot \Pr [\textrm{G}_{\mathcal {D}}] - 1\), and \(\textsf{Adv}^{\textsf{}}_{}(\mathcal {D}) = \Pr \left[ \,d' = 1\,|\,d = 1\, \right] - \Pr \left[ \,d' = 1\,|\,d = 0\, \right] \). As part of our reductions, the intermediary games (e.g. Fig. 39) use the following colour-coding: for equivalent but expanded code and for the code added for the transitions between games; the adversaries constructed for the transitions (e.g. Fig. 36) use to mark the changes in the code of the simulated reduction games.

2.2 Standard definitions

2.2.1 Fundamental Lemma of Game Playing

In our game-hopping proofs, we frequently make use of the Fundamental Lemma of Game Playing [17]. Suppose that the games \(\textrm{G}_i\) and \(\textrm{G}_{i+1}\) are identical until the flag \(\textsf{bad}\) is set. Then, we have

$$\begin{aligned} \Pr [\textrm{G}_i] - \Pr [\textrm{G}_{i+1}] \le \Pr [\textsf{bad}^{\textrm{G}_i}] = \Pr [\textsf{bad}^{\textrm{G}_{i+1}}], \end{aligned}$$

where \(\Pr [\textsf{bad}^\textrm{G}]\) denotes the probability of setting the flag \(\textsf{bad}\) in game \(\textrm{G}\).

2.2.2 Collision-resistant functions

Let \(f:\mathcal {D}_{f}\rightarrow \mathcal {R}_{f}\) be a function. Consider game \(\textrm{G}^{\textsf{cr}}\) of Fig. 2, defined for \(f\) and an adversary \(\mathcal {F}\). The advantage of \(\mathcal {F}\) in breaking the \(\textrm{CR}\)-security of \(f\) is defined as \(\textsf{Adv}^{\textsf{cr}}_{f}(\mathcal {F}) = \Pr [\textrm{G}^{\textsf{cr}}_{f, \mathcal {F}}]\). To win the game, adversary \(\mathcal {F}\) has to find two distinct inputs \(x_0, x_1 \in \mathcal {D}_{f}\) such that \(f(x_0) = f(x_1)\). Note that \(f\) is unkeyed, so there exists a trivial adversary \(\mathcal {F}\) with \(\textsf{Adv}^{\textsf{cr}}_{f}(\mathcal {F}) = 1\) whenever \(f\) is not injective. We will use this notion in a constructive way, to build a specific collision-resistance adversary \(\mathcal {F}\) (for \(f= \textsf{SHA}-\textsf{256}\) with a truncated output) in a security reduction.

Fig. 2
Fig. 2

Collision resistance of function \(f\)

2.2.3 Function families

A family of functions \(\textsf{F}\) specifies a deterministic algorithm \(\textsf{F}.\textsf{Ev}\), a key set \(\textsf{F}.\textsf{KS}\), an input set \(\textsf{F}.\textsf{IN}\) and an output length \(\textsf{F}.\textsf{ol}\in {{\mathbb {N}}}\). \(\textsf{F}.\textsf{Ev}\) takes a function key \(\textit{fk}\in \textsf{F}.\textsf{KS}\) and an input \(x\in \textsf{F}.\textsf{IN}\) to return an output \(y\in \{0,1\}^{\textsf{F}.\textsf{ol}}\). We write \(y \leftarrow \textsf{F}.\textsf{Ev}(\textit{fk}, x)\). The key length of \(\textsf{F}\) is \(\textsf{F}.\textsf{kl}\in {{\mathbb {N}}}\) if \(\textsf{F}.\textsf{KS} = \{0,1\}^{\textsf{F}.\textsf{kl}}\).

2.2.4 Block ciphers

Let \(\textsf{E}\) be a function family. We say that \(\textsf{E}\) is a block cipher if \(\textsf{E}.\textsf{IN} = \{0,1\}^{\textsf{E}.\textsf{ol}}\), and if \(\textsf{E}\) specifies (in addition to \(\textsf{E}.\textsf{Ev}\)) an inverse algorithm \(\textsf{E}.\textsf{Inv}:\{0,1\}^{\textsf{E}.\textsf{ol}} \rightarrow \textsf{E}.\textsf{IN}\) such that \(\textsf{E}.\textsf{Inv}(\textit{ek}, \textsf{E}.\textsf{Ev}(\textit{ek}, x)) = x\) for all \(\textit{ek}\in \textsf{E}.\textsf{KS}\) and all \(x\in \textsf{E}.\textsf{IN}\). We refer to \(\textsf{E}.\textsf{ol}\) as the block length of \(\textsf{E}\). Our pictures and attacks use \(E_K\) and \(E_{K}^{-1}\) as a shorthand for \(\textsf{E}.\textsf{Ev}(K, \cdot )\) and \(\textsf{E}.\textsf{Inv}(K, \cdot )\), respectively.

2.2.5 One-time PRF security of function family for multiple keys

Consider game \(\textrm{G}^\textsf{otprf}_{\textsf{F}, \mathcal {D}}\) of Fig. 3, defined for a function family \(\textsf{F}\) and an adversary \(\mathcal {D}\). The advantage of \(\mathcal {D}\) in breaking the \(\textrm{OTPRF}\)-security of \(\textsf{F}\) is defined as \(\textsf{Adv}^{\textsf{otprf}}_{\textsf{F}}(\mathcal {D}) = 2 \cdot \Pr [\textrm{G}^\textsf{otprf}_{\textsf{F}, \mathcal {D}}] - 1\). The game samples a uniformly random challenge bit b and runs adversary \(\mathcal {D}\), providing it with access to oracle \(\textsc {RoR}\). The oracle takes \(x\in \textsf{F}.\textsf{IN}\) as input, and the adversary is allowed to query the oracle arbitrarily many times. Each time \(\textsc {RoR}\) is queried on any x, it samples a uniformly random key \(\textit{fk}\) from \(\textsf{F}.\textsf{KS}\) and returns either \(\textsf{F}.\textsf{Ev}(\textit{fk}, x)\) (if \(b = 1\)) or a uniformly random element from \(\{0,1\}^{\textsf{F}.\textsf{ol}}\) (if \(b = 0\)). \(\mathcal {D}\) wins if it returns a bit \(b'\) that is equal to the challenge bit.

Fig. 3
Fig. 3

One-time PRF security of function family \(\textsf{F}\) for multiple keys

2.2.6 Symmetric encryption schemes

A symmetric encryption scheme \(\textsf{SE}\) specifies algorithms \(\mathsf {\textsf{SE}.Enc}\) and \(\mathsf {\textsf{SE}.Dec}\), where \(\mathsf {\textsf{SE}.Dec}\) is deterministic. Associated to \(\textsf{SE}\) is a key length \(\mathsf {\textsf{SE}.kl}\in {{\mathbb {N}}}\), a message space \(\mathsf {\textsf{SE}.MS}\subseteq \{0,1\}^* \setminus \{\varepsilon \}\), and a ciphertext length function \(\mathsf {\textsf{SE}.cl}:{{\mathbb {N}}}\rightarrow {{\mathbb {N}}}\). The encryption algorithm \(\mathsf {\textsf{SE}.Enc}\) takes a key \(k\in \{0,1\}^{\mathsf {\textsf{SE}.kl}}\) and a message \(m\in \mathsf {\textsf{SE}.MS}\) to return a ciphertext \(c\in \{0,1\}^{\mathsf {\textsf{SE}.cl}(\left| m\right| )}\). We write . The decryption algorithm \(\mathsf {\textsf{SE}.Dec}\) takes kc to return message \(m \in \mathsf {\textsf{SE}.MS}\cup \{\bot \}\), where \(\bot \) denotes incorrect decryption. We write \(m \leftarrow \mathsf {\textsf{SE}.Dec}(k, c)\). Decryption correctness requires that \(\mathsf {\textsf{SE}.Dec}(k, c) = m\) for all \(k\in \{0,1\}^{\mathsf {\textsf{SE}.kl}}\), all \(m\in \mathsf {\textsf{SE}.MS}\), and all \(c\in [\mathsf {\textsf{SE}.Enc}(k, m)]\). We say that \(\textsf{SE}\) is deterministic if \(\mathsf {\textsf{SE}.Enc}\) is deterministic.

Fig. 4
Fig. 4

One-time real-or-random indistinguishability of deterministic symmetric encryption scheme \(\textsf{SE}\)

2.2.7 One-time indistinguishability of SE

Consider game \(\textrm{G}^{\mathsf {otind\$}}\) of Fig. 4, defined for a deterministic symmetric encryption scheme \(\textsf{SE}\) and an adversary \(\mathcal {D}\). We define the advantage of \(\mathcal {D}\) in breaking the \(\mathrm {OTIND\$}\)-security of \(\textsf{SE}\) as \(\textsf{Adv}^{\mathsf {otind\$}}_{\textsf{SE}}(\mathcal {D}) = 2 \cdot \Pr [\textrm{G}^{\mathsf {otind\$}}_{\textsf{SE}, \mathcal {D}}] - 1\). The game proceeds as the \(\textrm{OTPRF}\) game.

2.2.8 CBC block cipher mode of operation

Let \(\textsf{E}\) be a block cipher. Define the Cipher Block Chaining (CBC) mode of operation as a deterministic symmetric encryption scheme \(\textsf{SE}= \textsf{CBC}[\textsf{E}]\) shown in Fig. 5, where key length is \(\mathsf {\textsf{SE}.kl}= \textsf{E}.\textsf{kl} + \textsf{E}.\textsf{ol}\), the message space \(\mathsf {\textsf{SE}.MS}= \bigcup _{t\in {{\mathbb {N}}}} \{0,1\}^{\textsf{E}.\textsf{ol}\cdot t}\) consists of messages whose lengths are multiples of the block length, and the ciphertext length function \(\mathsf {\textsf{SE}.cl}\) is the identity function. Note that Fig. 5 gives a somewhat non-standard definition for CBC, as it includes the IV (\(c_0\)) as part of the key material. However, in this work, we are only interested in one-time security of \(\textsf{SE}\), so keys and IVs are generated together and the IV is not included as part of the ciphertext.

Fig. 5
Fig. 5

Constructions of deterministic symmetric encryption schemes \(\textsf{CBC}[\textsf{E}]\) and \(\textsf{IGE}[\textsf{E}]\) from block cipher \(\textsf{E}\). Consider \(t\) as the number of blocks of m (or c), i.e. \(m = m_1 ~\Vert ~\ldots ~\Vert ~m_t\)

2.2.9 IGE block cipher mode of operation

Let \(\textsf{E}\) be a block cipher. Define the Infinite Garble Extension (IGE) mode of operation as \(\textsf{SE}= \textsf{IGE}[\textsf{E}]\) as in Fig. 5, with parameters as in the CBC mode except for key length \(\mathsf {\textsf{SE}.kl}= \textsf{E}.\textsf{kl} + 2 \cdot \textsf{E}.\textsf{ol}\) (since IGE has two IV blocks which we again include as part of the key). We depict IGE decryption in Fig. 6 as we rely on this in Sect. 6.

IGE was first defined in [22], which claims it has infinite error propagation and thus can provide integrity. This claim was disproved in an attack on Free-MAC [36], which has the same specification as IGE. [36] shows that given a plaintext–ciphertext pair it is possible to construct another ciphertext that will correctly decrypt to a plaintext such that only two of its blocks differ from the original plaintext, i.e. the “errors” introduced in the ciphertext do not propagate forever. IGE also appears as a special case of the Accumulated Block Chaining (ABC) mode [38]. A chosen-plaintext attack on ABC that relied on IV reuse between encryptions was described in [11].

Fig. 6
Fig. 6

IGE mode decryption, where \(c_0 = IV _c\) and \(m_0 = IV _m\) are the initial values so decryption can be expressed as \(m_{i} = E_{K}^{-1}(c_{i} \oplus m_{i-1}) \oplus c_{i-1}\)

2.2.10 MD transform

Figure 7 defines the Merkle–Damgård transform as a function family \(\textsf{MD}[h ]\) for a given compression function \(h :\{0,1\}^\ell \times \{0,1\}^{\ell '} \rightarrow \{0,1\}^\ell \), with \(\textsf{MD}.\textsf{IN} = \bigcup _{t\in {{\mathbb {N}}}} \{0,1\}^{{\ell '} \cdot t}\), \(\textsf{MD}.\textsf{KS} = \{0,1\}^\ell \) and \(\textsf{MD}.\textsf{ol} = \ell \).Footnote 4

Fig. 7
Fig. 7

Left pane: Construction of MD transform \(\textsf{MD}= \textsf{MD}[h ]\) from compression function \(h :\{0,1\}^\ell \times \{0,1\}^{\ell '} \rightarrow \{0,1\}^\ell \). Right pane: \(\textsf{SHA}-\textsf{pad}\) pads \(\textsf{SHA}-\textsf{1}\) or \(\textsf{SHA}-\textsf{256}\) input x to a length that is a multiple of 512 bits

2.2.11 \(\textsf{SHA}-\textsf{1}\) and \(\textsf{SHA}-\textsf{256}\)

Let \(\textsf{SHA}-\textsf{1}: \{0,1\}^* \rightarrow \{0,1\}^{160}\) and \(\textsf{SHA}-\textsf{256}: \{0,1\}^* \rightarrow \{0,1\}^{256}\) be the hash functions as defined in [47]. We refer to their compression functions as \(h _{160}: \{0,1\}^{160} \times \{0,1\}^{512} \rightarrow \{0,1\}^{160}\) and \(h _{256}: \{0,1\}^{256} \times \{0,1\}^{512} \rightarrow \{0,1\}^{256}\), and to their initial states as \(\textsf{IV}_{160}\) and \(\textsf{IV}_{256}\). We can write

$$\begin{aligned} \begin{aligned} \textsf{SHA}-\textsf{1}(x)&= \textsf{MD}[h _{160}].\textsf{Ev}(\textsf{IV}_{160}, \textsf{SHA}-\textsf{pad}(x)), \text { and} \\ \textsf{SHA}-\textsf{256}(x)&= \textsf{MD}[h _{256}].\textsf{Ev}(\textsf{IV}_{256}, \textsf{SHA}-\textsf{pad}(x)) \end{aligned} \end{aligned}$$

where \(\textsf{SHA}-\textsf{pad}\) is defined in Fig. 7.

2.2.12 \(\textsf{SHACAL}-\textsf{1}\) and \(\textsf{SHACAL}-\textsf{2}\)

Let \(\;\hat{+}\;\) be an addition operator over 32-bit words, meaning for any \(x,y\in \bigcup _{t\in {{\mathbb {N}}}}\{0,1\}^{32\cdot t}\) with \(\left| x\right| =\left| y\right| \) the instruction \(z \leftarrow x \;\hat{+}\;y\) splits x and y into 32-bit words and independently adds together words at the same positions, each modulo \(2^{32}\); it then computes z by concatenating together the resulting 32-bit words. Let \(\textsf{SHACAL}-\textsf{1}\) [30] be the block cipher defined by \(\textsf{SHACAL}-\textsf{1}.\textsf{kl} = 512\), \(\textsf{SHACAL}-\textsf{1}.\textsf{ol} = 160\) such that \(h _{160}(k, x) = k \;\hat{+}\;\textsf{SHACAL}-\textsf{1}.\textsf{Ev}(x, k)\). Similarly, let \(\textsf{SHACAL}-\textsf{2}\) be the block cipher defined by \(\textsf{SHACAL}-\textsf{2}.\textsf{kl} = 512\), \(\textsf{SHACAL}-\textsf{2}.\textsf{ol} = 256\) such that \(h _{256}(k, x) = k \;\hat{+}\;\textsf{SHACAL}-\textsf{2}.\textsf{Ev}(x, k)\). See Fig. 8.

Fig. 8
Fig. 8

\(\textsf{SHA}-\textsf{256}\) compression function \(h _{256}{}\) and its underlying block cipher \(\textsf{SHACAL}-\textsf{2}{}\)

3 Bidirectional channels

3.1 Our formal model in the context of prior work

3.1.1 The choice of a cryptographic primitive

We model the symmetric part of Telegram’s MTProto protocol as a bidirectional cryptographic channel. A channel provides a method for two users to exchange messages, and it is called bidirectional [43] when each user can both send and receive messages. A unidirectional channel provides an interface between two users where only a single user can send messages, and only the opposite user can receive them. Two unidirectional channels can be composed to build a bidirectional channel, but some care needs to be taken to establish what level of security is inherited by the resulting channel [43]. A symmetric encryption scheme can be thought of as a special case of a unidirectional channel; it allows to achieve security notions stronger than unforgeability once its encryption and decryption algorithms are modelled as being stateful [15, 16].

MTProto uses distinct but related secret keys to send messages in the opposite directions on the channel, so it would not be sufficient to model it as a unidirectional channel. Such an analysis could miss bad interactions between the two directions.

3.1.2 The choice of a security model

Cryptographic security models normally require that channels provide in-order delivery of all messages. In the unidirectional setting, this means that the receiver should only accept messages in the same order as they were dispatched by the sender. In particular, the channel must prevent all attempts to forge, replay, reorder or drop messages.Footnote 5 In the bidirectional setting, the in-order delivery is required to hold separately in either direction of communication.Footnote 6

The current version of MTProto 2.0 does not enforce in-order message delivery. It determines whether a successfully decrypted ciphertext should be accepted based on a complex set of rules. In particular, it happens to allow message reordering, as we describe in Sect. 4.2. We consider that a vulnerability. So in Sect. 4.4 we define a slight variant of MTProto 2.0 that enforces in-order delivery. Our security analysis in Sect. 5 is then provided with respect to the fixed version of the protocol. Nevertheless, we set out to choose a formal model for channels that could also potentially be used to analyse the current version of MTProto 2.0. In particular, we chose a model that could express both in-order delivery and the message delivery rules that are used in the current version.Footnote 7

No prior work on bidirectional channels defines correctness and security notions that could be used to capture message delivery rules of varied strengths. In the unidirectional setting, [20, 40] each define a hierarchy of multiple security notions where the weakest notion requires only unforgeability and the strongest requires in-order delivery. [28, 51] define abstract definitional frameworks for unidirectional channels with fully parametrisable security notions. In this work we extend the robust channel framework of [27, 28], lifting it to the bidirectional setting.

3.1.3 Extending the robust channel framework

The robust channel framework [28] defines unidirectional correctness and security notions with respect to an arbitrary support predicate. When a ciphertext is delivered to the receiver, the corresponding notion uses the support predicate to determine whether the channel is expected to accept this ciphertext or to reject it, i.e. whether this ciphertext is currently supported. For example, the notion of correctness in [28] requires that a channel accepts and correctly decrypts all supported ciphertexts, whereas their notion of integrity requires that a channel rejects all ciphertexts that are not supported. The correctness and security games in [28] maintain a sequence of ciphertexts that were sent by the sender, and a sequence of ciphertexts that were received and accepted by the receiver. A support predicate takes both sequences as input and it can use them to decide on whether an incoming ciphertext is supported. For completeness, we provide the core definitions of [28] in Appendix C.2.

We lift the robust channel framework [28] to the bidirectional setting, and we significantly extend it in other ways. Most importantly, our framework uses more information to determine whether an incoming ciphertext is supported. In particular, we define our correctness and security games to maintain a support transcript for each of the two users; this extends the idea of using sequences of sent and received ciphertexts in [28]. A user’s support transcript represents a sequence of events, each entry describing an attempt to send or to receive a message. More precisely, each entry can be thought of as describing one of the following events (stated in terms of some specific plaintext m and/or ciphertext c): “sent c that encrypts m”, “failed to send m”, “received c, accepted it, and decrypted it as m”, “received c and rejected it”. In our framework, the support transcripts are used by a support function; it extends the concept of the support predicate from [28]. Given the support transcripts of both users as input, a support function in our framework is meant to prescribe the exact behaviour of a channel when a new ciphertext is delivered to either user. Namely, a support function either determines that the incoming ciphertext must be rejected, or it determines that the incoming ciphertext must be accepted and a specific plaintext value must be obtained upon decrypting this ciphertext. For example, our notion of correctness is similar in spirit to that of [28], requiring that a channel accepts and correctly decrypts each plaintext that is not rejected by a support function. The core difference between our correctness notion and that of [28] is in how these definitions determine whether a specific ciphertext was decrypted “correctly”. In our framework, the output of a support function prescribes that a specific plaintext value must be obtained, whereas in [28] the correctness game builds a lookup table to determine that value.

The above example provides an intuition that by defining our support transcripts to contain plaintext messages, we obtain simpler correctness and security definitions when compared to [28]. But one could also see this as a trade-off between different parts of the formalism, because some complexity that is removed from the correctness and security games might simply be relegated to the step of specifying and analysing a support function. In order to better understand how our framework relates to the robust channel framework, in Appendix C we provide a thorough comparison between the unidirectional variants of our definitions and those of [28].

3.1.4 Relation to secure messaging models

A recent line of work uses channels to study the best achievable security of instant messaging between two users. A limited, unidirectional case was first considered by [18]; follow-up work uses bidirectional channels [7, 21, 33, 35]. The focus is on achieving strong forward security and post-compromise security guarantees in the presence of an attacker that can compromise secret states of the users. With the exception of [7], all of this work models channels that are required to provide in-order message delivery. In contrast, the immediate decryption-aware channel of [7] effectively allows message drops but mandates that the dropped messages can later be delivered and retroactively assigned to their correct positions in the communication transcript. Any of these bidirectional models except [7] could be simplified (to not require advanced security properties) and used for a formal analysis of our MTProto-based channel from Sect. 4.4. None of these models would be able to capture the correctness and security properties of MTProto 2.0 as it is currently implemented.

3.2 Syntax of channels

We refer to the two users of a channel as \(\mathcal {I}\) and \(\mathcal {R}\). These will map to the client and the server in the setting of MTProto. We use \(\textit{u}\in \{\mathcal {I},\mathcal {R}\}\) as a variable to represent an arbitrary user and \(\overline{\textit{u}}\) to represent the other user, meaning \(\overline{\textit{u}}\) denotes the sole element of \(\{\mathcal {I},\mathcal {R}\}\setminus \{\textit{u}\}\). We use \(\textit{st}_\textit{u}\) to represent the internal state of user \(\textit{u}\). A channel uses an initialisation algorithm to abstract away the key agreement; this matches the main focus of our work—to study the symmetric encryption of MTProto.

Fig. 9
Fig. 9

Syntax of the constituent algorithms of channel \(\textsf{CH}\)

Definition 1

A channel \(\textsf{CH}\) specifies algorithms \(\mathsf {\textsf{CH}.Init}\), \(\mathsf {\textsf{CH}.Send}\) and \(\mathsf {\textsf{CH}.Recv}\), where \(\mathsf {\textsf{CH}.Recv}\) is deterministic. The syntax used for the algorithms of \(\textsf{CH}\) is given in Fig. 9. Associated to \(\textsf{CH}\) is a plaintext space \(\mathsf {\textsf{CH}.MS}\subseteq \{0,1\}^* \setminus \{\varepsilon \}\) and a randomness space \(\mathsf {\textsf{CH}.SendRS}\) of \(\mathsf {\textsf{CH}.Send}\). The initialisation algorithm \(\mathsf {\textsf{CH}.Init}\) returns \(\mathcal {I}\)’s and \(\mathcal {R}\)’s initial states \(\textit{st}_{\mathcal {I}}\) and \(\textit{st}_{\mathcal {R}}\). The sending algorithm \(\mathsf {\textsf{CH}.Send}\) takes \(\textit{st}_{\textit{u}}\) for some \(\textit{u}\in \{\mathcal {I},\mathcal {R}\}\), a plaintext \(m\in \mathsf {\textsf{CH}.MS}\), and auxiliary information \(\textit{aux}\) to return the updated state \(\textit{st}_{\textit{u}}\) and a ciphertext c, where \(c = \bot \) may be used to indicate a failure to send. We may surface random coins \(r\in \mathsf {\textsf{CH}.SendRS}\) as an additional input to \(\mathsf {\textsf{CH}.Send}\). The receiving algorithm \(\mathsf {\textsf{CH}.Recv}\) takes \(\textit{st}_{\textit{u}}\), c and auxiliary information \(\textit{aux}\) to return the updated state \(\textit{st}_{\textit{u}}\) and a plaintext \(m\in \mathsf {\textsf{CH}.MS}\cup \{\bot \}\), where \(\bot \) indicates a failure to recover a plaintext.

Our channel definition reflects some unusual choices that are necessary to model the MTProto protocol. The abstract auxiliary information field \(\textit{aux}\) will be used to associate timestamps to each sent and received message.Footnote 8 In this work, we do not use the \(\textit{aux}\) field to model associated data that would need to be authenticated, but our definitions in principle allow to use it that way. Also note that the sending algorithm \(\mathsf {\textsf{CH}.Send}\) is randomised, but a stateful channel in general does not need randomness to achieve basic security notions. We only use randomness to faithfully model MTProto; it uses randomness to determine the length and contents of message padding. Our correctness and security notions will let an attacker choose arbitrary random coins, so we surface it as an optional input to the sending algorithm \(\mathsf {\textsf{CH}.Send}\).

3.3 Support transcripts and functions

In this section, we extend the definitional framework for robust channels from [28]. In Sect. 3.1, we outlined the core differences between the two frameworks, and in Appendix C we provide a detailed comparison between them.

3.3.1 Support transcripts

We define a support transcript to represent the communication record of a single user. Each transcript entry describes an attempt to send or to receive a plaintext, ordered chronologically. A support transcript \(\textsf{tr}_{\textit{u}}\) of user \(\textit{u}\in \{\mathcal {I},\mathcal {R}\}\) contains two types of entries: \((\textsf{sent}, m, \textsf{label}, \textit{aux})\) and \((\textsf{recv}, m, \textsf{label}, \textit{aux})\) for an event of sending or receiving some plaintext m, respectively. In either case, \(\textsf{label}\) is a support label whose purpose is to distinguish between different network messages each encrypting or encoding a specific plaintext m, and \(\textit{aux}\) is auxiliary information such as the timestamp at the moment of sending or receiving the network message. Depending on the level of abstraction, our model uses ciphertexts or message encodings as support labels.Footnote 9

Definition 2

A support transcript \(\textsf{tr}_{\textit{u}}\) for user \(\textit{u}\in \{\mathcal {I},\mathcal {R}\}\) is a list of entries of the form \((\textsf{op}, m, \textsf{label}, \textit{aux})\), where \(\textsf{op}\in \{\textsf{sent}, \textsf{recv}\}\). An entry with \(\textsf{op}= \textsf{sent}\) indicates that user \(\textit{u}\) attempted to send a network message that encrypts or encodes plaintext m with auxiliary information \(\textit{aux}\). An entry with \(\textsf{op}= \textsf{recv}\) indicates that user \(\textit{u}\) received a network message with auxiliary information \(\textit{aux}\) and used it to recover plaintext m. In either case, the network message is identified by its support label \(\textsf{label}\).

A support transcript is not intended to surface the implementation details of the primitive that is used for communication. This is reflected in our abstract treatment of the support labels: an outside observer with no knowledge of the internal states of the two communicating users might not be able to interpret the (possibly encrypted) network messages that are being exchanged. So our framework treats each network message as a mere label that can be observed to be sent by a user in response to some plaintext input. One might subsequently observe the same label being taken as input by the opposite user, resulting in some plaintext output. If the scheme used for the two-user communication guarantees that all such labels are unique, an observer might be able to use the equality of exchanged labels across both support transcripts to determine whether a message replay, reordering or drop occurred. The MTProto-based scheme that we study in this paper produces distinct ciphertexts, and our framework uses ciphertexts as support labels when analysing a channel; this will allow us to rely on equality patterns that arise between them. In this work, we use no information about support labels beyond their equality patterns.

Support transcripts can include entries of the form \((\textsf{recv}, m, \textsf{label}, \textit{aux})\) with the plaintext \(m = \bot \) to indicate that the received network message was rejected. Support transcripts can also include entries of the form \((\textsf{sent}, m, \textsf{label}, \textit{aux})\) with the support label value \(\textsf{label}= \bot \), e.g. to indicate that a network message encrypting the plaintext m could not be sent over a terminated channel. Our support transcripts are therefore suitable for two-user communication primitives that implement a wide range of possible behaviours in the event of an error, from terminating after the first failure to full recovery.

Fig. 10
Fig. 10

Construction of sample channel \(\textsf{CH}= \textsf{SAMPLE}\text {-}\textsf{CH}[\textsf{SE}]\) from symmetric encryption scheme \(\textsf{SE}\). This channel ignores the auxiliary information \(\textit{aux}\)

Fig. 11
Fig. 11

Communication between users \(\mathcal {I}\) and \(\mathcal {R}\) over the sample channel \(\textsf{CH}\) defined in Fig. 10. The resulting communication records of \(\mathcal {I}\) and \(\mathcal {R}\) are represented by support transcripts \(\textsf{tr}_{\mathcal {I}}\) and \(\textsf{tr}_{\mathcal {R}}\), respectively. The transcripts contradict each other due to an adversarial behaviour on the network

We now provide the construction of a sample channel, along with an example of how communication over this channel can be captured using support transcripts. We will use this channel and its support transcripts to showcase more examples throughout this section. Let \(\textsf{SE}\) be an arbitrary symmetric encryption scheme that provides integrity and confidentiality (i.e. it provides authenticated encryption). Consider a sample channel \(\textsf{CH}= \textsf{SAMPLE}\text {-}\textsf{CH}[\textsf{SE}]\) as defined in Fig. 10. In addition to the security assurances inherited from \(\textsf{SE}\), the channel \(\textsf{CH}\) is only designed to prevent forgeries that could occur by mirroring a ciphertext back to its sender. Figure 11 provides a step-by-step example of communication between users \(\mathcal {I}\) and \(\mathcal {R}\) over \(\textsf{CH}\). It shows \(\mathcal {I}\)’s and \(\mathcal {R}\)’s support transcripts at the end of the communication between them, where the channel’s ciphertexts are used as labels. Note that the ciphertext \(c_{\mathcal {I}, 2}\) was dropped and the ciphertext \(c_{\mathcal {I}, 0}\) was replayed in its place. As a result, each user’s transcript shows that the other user endorsed crimes.

3.3.2 Support functions

We now define the notion of a support function. We use a support function to prescribe the exact input–output behaviour of a receiver at any point in a two-user communication process (i.e. we use it to specify the expected behaviour of a channel’s decryption algorithm or that of a message encoding scheme’s decoding algorithm, the latter primitive defined in Sect. 3.5). More specifically, a support function \(\textsf{supp}\) determines whether a user \(\textit{u}\in \{\mathcal {I},\mathcal {R}\}\) should accept an incoming network message—that is associated with a support label \(\textsf{label}\)—from the opposite user \(\overline{\textit{u}}\), based on the support transcripts \(\textsf{tr}_{\textit{u}}, \textsf{tr}_{\overline{\textit{u}}}\) of both users. If the network message should be accepted, then \(\textsf{supp}\) must return a plaintext \(m^*\) to indicate that \(\textit{u}\) is expected to recover \(m^*\) as a result of accepting it; otherwise, \(\textsf{supp}\) must return \(\bot \) to indicate that the network message must be rejected. We also let \(\textsf{supp}\) take the auxiliary information \(\textit{aux}\) as input so that timestamps can be captured in our definitions.

Definition 3

A support function \(\textsf{supp}\) is an efficiently computable deterministic function, written \(\textsf{supp}(\textit{u}, \textsf{tr}_{\textit{u}}, \textsf{tr}_{\overline{\textit{u}}}, \textsf{label}, \textit{aux}) \rightarrow m^*\), where \(\textit{u}\in \{\mathcal {I}, \mathcal {R}\}\), \(\textsf{tr}_{\textit{u}}\), \(\textsf{tr}_{\overline{\textit{u}}}\) are support transcripts for users \(\textit{u}\) and \(\overline{\textit{u}}\), respectively, \(\textsf{label}\) is any label that identifies the network message, and \(\textit{aux}\) is auxiliary information associated with the network message; \(m^*\) is then the plaintext message that should be recovered by \(\textit{u}\).

In Sect. 3.4, we define the notions of channel correctness, integrity, and indistinguishability. Our correctness and integrity notions jointly require that the channel’s receiving algorithm works exactly as prescribed by a specific support function. More precisely, both notions require that the channel’s receiving algorithm consistently returns the same output as that returned by the support function, but each notion is defined with respect to an adversary that has different capabilities. In the correctness game, the adversary gets the channel’s state as input and is only allowed to query the receiving algorithm on supported ciphertexts (i.e. those that are not rejected by the support function). In the integrity game, the adversary does not get any secrets as input and is allowed to query the receiving algorithm on all possible inputs, including attempted ciphertext forgeries or any gibberish inputs that aim to corrupt the channel’s state. This definitional approach is similar in spirit to how correctness and integrity are defined for basic cryptographic primitives. For example, for a symmetric encryption scheme one often considers the notions of decryption correctness and ciphertext integrity, where the former should hold even when the adversary knows the secret key, whereas the latter requires the adversary to produce ciphertext forgeries without knowing the key. In comparison, a channel is a stateful primitive so its correctness and integrity conditions can be significantly more complex, depending on how it should treat forgeries, replays, reordering, and drops. A support function allows us to capture these conditions in a modular way. Finally, the notion of indistinguishability that we define for a channel requires that the output of the channel’s sending algorithm leaks no information about the encrypted plaintext; this security notion makes no use of a support function.

Fig. 12
Fig. 12

Sample support functions \(\textsf {SAMPLE\text {-}SUPP}_0\) and \(\textsf {SAMPLE\text {-}SUPP}_1\). Support function \(\textsf {SAMPLE\text {-}SUPP}_1\) includes the boxed code, and support function \(\textsf {SAMPLE\text {-}SUPP}_0\) does not include it. Both support functions allow arbitrary auxiliary information, never checking the values of \(\textit{aux}\) and \(\textit{aux}'\)

Consider a sample support function \(\textsf {SAMPLE\text {-}SUPP}_0\) in Fig. 12. It does not contain the boxed code. The support function prohibits forgeries by returning \(\bot \) if the opposite user’s support transcript \(\textsf{tr}_{\overline{\textit{u}}}\) does not contain an entry, indicating that \(\overline{\textit{u}}\) previously sent a network message associated with the support label \(\textsf{label}\). If a forgery is not detected, then the support function finds and returns a plaintext m such that \((\textsf{sent}, m, \textsf{label}, \textit{aux}')\) belongs to \(\textsf{tr}_{\overline{\textit{u}}}\) with any \(\textit{aux}'\). For any symmetric encryption scheme \(\textsf{SE}\) that provides authenticated encryption, recall algorithms \(\mathsf {\textsf{CH}.Init}\) and \(\mathsf {\textsf{CH}.Send}\) of the sample channel \(\textsf{CH}= \textsf{SAMPLE}\text {-}\textsf{CH}[\textsf{SE}]\) defined in Fig. 10; let us treat ciphertexts produced by \(\mathsf {\textsf{CH}.Send}\) as support labels. Then, the algorithm \(\mathsf {\textsf{CH}.Recv}\) from Fig. 10 implements the functionality that is prescribed by \(\textsf {SAMPLE\text {-}SUPP}_0\): it rejects forgeries and otherwise recovers and returns the originally encrypted plaintext. Note that \(\textsf {SAMPLE\text {-}SUPP}_0\) grabs the first plaintext m that it finds associated to \(\textsf{label}\) in \(\textsf{tr}_{\overline{\textit{u}}}\), without checking whether any other plaintext values are also associated to \(\textsf{label}\). This does not produce ambiguity when used with algorithms \(\mathsf {\textsf{CH}.Init}\) and \(\mathsf {\textsf{CH}.Send}\); implicit in our example is that \(\textsf{SE}\) provides decryption correctness, and therefore, two distinct plaintexts cannot be encrypted into the same ciphertext (and hence be mapped to the same support label). This illustrates that a support function may appear ambiguous in isolation, but when considered alongside a channel whose properties rule out such ambiguity, its behaviour may be well-defined.

Consider another sample support function \(\textsf {SAMPLE\text {-}SUPP}_1\) as defined in Fig. 12. In addition to the code from \(\textsf {SAMPLE\text {-}SUPP}_0\), this support function also contains the boxed code. The added code is designed to prevent replays by rejecting any network message associated with a support label \(\textsf{label}\) that is already present in one of the entries of the receiver’s support transcript \(\textsf{tr}_{\textit{u}}\). For example, consider the following intermediate support transcripts of users \(\mathcal {I}\) and \(\mathcal {R}\) that could have arisen at some point during the communication displayed in Fig. 11:

$$\begin{aligned} \begin{aligned} \textsf{tr}_{\mathcal {I}, 3} = \big [&(\textsf{sent}, \text {``I say yes to''}, c_{\mathcal {I}, 0}, \varepsilon ), (\textsf{sent}, \text {``all the pizza''}, c_{\mathcal {I}, 1}, \varepsilon ), \\ &(\textsf{sent}, \text {``I say no to''}, c_{\mathcal {I}, 2}, \varepsilon ) \big ] \\ \textsf{tr}_{\mathcal {R}, 2} = \big [&(\textsf{recv}, \text {``I say yes to''}, c_{\mathcal {I}, 0}, \varepsilon ), (\textsf{recv}, \text {``all the pizza''}, c_{\mathcal {I}, 1}, \varepsilon ) \big ] \end{aligned} \end{aligned}$$

These support transcripts represent the moment when \(\mathcal {I}\) has already sent 3 network messages, but so far \(\mathcal {R}\) has only received 2 of them. Following Fig. 11, let us assume that a replay attack happens next and \(\mathcal {R}\) receives a network message containing the ciphertext \(c_{\mathcal {I}, 0}\) with auxiliary information \(\textit{aux}= \varepsilon \). According to \(\textsf {SAMPLE\text {-}SUPP}_0\), this network message should be accepted (and should decrypt to \(m^*= \text {``I say yes to''}\)), but according to \(\textsf {SAMPLE\text {-}SUPP}_1\) this network message should be rejected:

$$\begin{aligned} \begin{aligned} \textsf {SAMPLE\text {-}SUPP}_0(\mathcal {R}, \textsf{tr}_{\mathcal {R}, 2}, \textsf{tr}_{\mathcal {I}, 3}, c_{\mathcal {I}, 0}, \varepsilon )&= \text {``I say yes to''} \\ \textsf {SAMPLE\text {-}SUPP}_1(\mathcal {R}, \textsf{tr}_{\mathcal {R}, 2}, \textsf{tr}_{\mathcal {I}, 3}, c_{\mathcal {I}, 0}, \varepsilon )&= \bot \end{aligned} \end{aligned}$$

Note that the algorithm \(\mathsf {\textsf{CH}.Recv}\) from Fig. 10 can be changed to simply reject duplicate ciphertexts in order to accommodate the specification of \(\textsf {SAMPLE\text {-}SUPP}_1\), without having to change algorithms \(\mathsf {\textsf{CH}.Init}\) and \(\mathsf {\textsf{CH}.Send}\). That would result in a contrived channel where the same plaintext can be encrypted and sent multiple times, but only the first of them is allowed to be received. A more appropriate change would require to also concatenate a distinct counter to each plaintext processed by \(\mathsf {\textsf{CH}.Send}\), so that the same plaintext can be sent and received many times while still preventing replay attacks by a third party.

We now provide some observations about the power of support functions. This is irrelevant for the purpose of analysing MTProto, but is useful to highlight the strengths and limitations of our framework in general:

  • A support function does not take as input any information about the internal state of the primitive that is used for communication (i.e. that of a channel or a message encoding scheme). But a communication primitive might use its internal state to interpret incoming network messages in a non-trivial way. For example, in some channels the same ciphertext (in our framework associated with the same support label) could be repeatedly decrypted to a different plaintext depending on some shared secret that is being synchronously evolved by both users. A support function might not be able to capture a receiver’s behaviour in cases like this. Support functions are best suited for communication where the knowledge that “user \(\textit{u}\) created a network message \(\xi \) to send a plaintext m” uniquely determines that the opposite user \(\overline{\textit{u}}\) can only recover m from \(\xi \) (or otherwise produce the error symbol \(\bot \)).

  • Due to having access to user support transcripts, a support function can prescribe a receiver’s behaviour that is not achievable by any implementation. For example, if two channel ciphertexts \(c_0\), \(c_1\) were sent by the user \(\textit{u}\) prior to any of them being received by the user \(\overline{\textit{u}}\), then a support function can require \(\overline{\textit{u}}\) to recover both underlying plaintexts from the first ciphertext it receives. This is impossible if each ciphertext encrypted an independently sampled and uniformly random value.

  • A support function prescribes a receiver’s behaviour with respect to a pair of existing support transcripts. But our framework does not have a similar way to state complex requirements regarding a sender’s behaviour. For example, our framework can require a channel user’s receiving algorithm to perpetually return \(\bot \) once the channel is considered closed (e.g. due to repeated errors while processing incoming ciphertexts), but it cannot require for the same user’s sending algorithm to subsequently return \(\bot \) in response to all attempts to send new plaintexts.

In Sect. 5.3, we define the support function \(\textsf{supp}\text {-}\textsf{ord}\) with respect to which we will analyse the security of MTProto 2.0. In Appendix A, we formalise two correctness-style properties of a support function, but we do not mandate that they must always be met. Both properties were also considered in [28]. The integrity of a support function requires that it always returns \(\bot \) if the queried support label \(\textsf{label}\) does not appear in the opposite user’s support transcript \(\textsf{tr}_{\overline{\textit{u}}}\). The order correctness of a support function requires that it enforces in-order delivery for each direction between the two users separately, assuming that each network message is associated with a distinct support label.

3.4 Correctness and security of channels

In Sect. 3.3, we provided a high-level intuition regarding how we define channel correctness and security notions, here we formalise them. In all of the notions, we allow the adversary to control the randomness used by the channel’s sending algorithm \(\mathsf {\textsf{CH}.Send}\). Channels are stateful, so they can achieve strong notions of security even when the adversary can control the randomness used for encryption.

Fig. 13
Fig. 13

Correctness of channel \(\textsf{CH}\); integrity of channel \(\textsf{CH}\). Both notions are defined with respect to support function \(\textsf{supp}\)

3.4.1 Correctness

Consider the correctness game \(\textrm{G}^{\textsf{corr}}_{\textsf{CH}, \textsf{supp}, \mathcal {F}}\) in Fig. 13, defined for a channel \(\textsf{CH}\), a support function \(\textsf{supp}\) and an adversary \(\mathcal {F}\). The advantage of \(\mathcal {F}\) in breaking the correctness of \(\textsf{CH}\) with respect to \(\textsf{supp}\) is defined as \(\textsf{Adv}^{\textsf{corr}}_{\textsf{CH}, \textsf{supp}}(\mathcal {F}) = \Pr [\textrm{G}^{\textsf{corr}}_{\textsf{CH}, \textsf{supp}, \mathcal {F}}]\). The game starts by calling the algorithm \(\mathsf {\textsf{CH}.Init}\) to initialise users \(\mathcal {I}\) and \(\mathcal {R}\), and the adversary is given their initial states. The adversary \(\mathcal {F}\) gets access to a sending oracle \(\textsc {Send}\) and to a receiving oracle \(\textsc {Recv}\). Calling \(\textsc {Send}(\textit{u}, m, \textit{aux}, r)\) encrypts the plaintext m with auxiliary data \(\textit{aux}\) and randomness \(r\) from the user \(\textit{u}\) to the other user \(\overline{\textit{u}}\); the resulting tuple \((\textsf{sent},m,c,\textit{aux})\) is added to the sender’s transcript \(\textsf{tr}_{\textit{u}}\). Oracle \(\textsc {Recv}\) can only be called on ciphertexts that should not produce a decryption error according to the behaviour prescribed by the support function \(\textsf{supp}\) (when queried on the current support transcripts), meaning \(\textsc {Recv}\) immediately exits with \(\bot \) when \(\textsf{supp}\) returns \(m^*= \bot \). Calling \(\textsc {Recv}(\textit{u}, c, \textit{aux})\) thus recovers the plaintext \(m^*\) from the support function, decrypts the queried ciphertext c into plaintext m and adds \((\textsf{recv},m,c,\textit{aux})\) to the receiver’s transcript \(\textsf{tr}_{\textit{u}}\); the game verifies that the decrypted plaintext m is equal to \(m^*\). If the adversary can cause the channel to output a different m, then the adversary wins. This game captures the minimal requirement one would expect from a communication channel: that it succeeds to decrypt incoming ciphertexts in accordance to its specification, with only a limited possible interference from an adversary. In particular, the adversary is not allowed to test that the channel appropriately identifies and handles any errors.

Note that the \(\textsc {Recv}\) oracle always returns \(\bot \), but \(\mathcal {F}\) can use the support function to compute the value m on its own for as long as the condition \(m = m^*\) has never been false yet.Footnote 10 Based on the same condition, \(\mathcal {F}\) can also use the support function to distinguish whether \(\bot \) was returned because \(m^*= \bot \) or because the end of the code of \(\textsc {Recv}\) was reached (i.e. its last instruction “Return \(\bot \)” was evaluated).

Consider the sample channel \(\textsf{CH}= \textsf{SAMPLE}\text {-}\textsf{CH}[\textsf{SE}]\) from Fig. 10 for any symmetric encryption scheme \(\textsf{SE}\) that has decryption correctness. Then, \(\textsf{CH}\) provides correctness with respect to either sample support function \(\textsf{supp}\in \{\textsf {SAMPLE\text {-}SUPP}_0, \textsf {SAMPLE\text {-}SUPP}_1\}\) from Fig. 12. In particular, for all adversaries \(\mathcal {F}\) we have \(\textsf{Adv}^{\textsf{corr}}_{\textsf{CH}, \textsf{supp}}(\mathcal {F}) = 0\).

3.4.2 Integrity

Consider the integrity game \(\textrm{G}^{\textsf{int}}_{\textsf{CH}, \textsf{supp}, \mathcal {F}}\) in Fig. 13, defined for a channel \(\textsf{CH}\), a support function \(\textsf{supp}\) and an adversary \(\mathcal {F}\). The advantage of \(\mathcal {F}\) in breaking the \(\textrm{INT}\)-security of \(\textsf{CH}\) with respect to \(\textsf{supp}\) is defined as \(\textsf{Adv}^{\textsf{int}}_{\textsf{CH}, \textsf{supp}}(\mathcal {F}) = \Pr [\textrm{G}^{\textsf{int}}_{\textsf{CH}, \textsf{supp}, \mathcal {F}}]\). We define the integrity game in a very similar way to the correctness game above, but with two important distinctions. First, in the integrity game the adversary \(\mathcal {F}\) no longer gets the initial states of users \(\mathcal {I}\) and \(\mathcal {R}\) as input. Second, the receiving oracle \(\textsc {Recv}\) now allows all inputs from the adversary \(\mathcal {F}\), including those that are meant to be rejected according to the support function \(\textsf{supp}\). These changes reflect the intuition that the adversary \(\mathcal {F}\) is now also allowed to win by producing an input such that the channel’s receiving algorithm returns \(m \ne \bot \) while the support function returned \(m^*= \bot \), which is essentially a forgery. The adversary does not get the channel’s initial states as input because that could trivialize its goal of producing a forgery.

According to the examples discussed in Sect. 3.3, the sample channel \(\textsf{CH}= \textsf{SAMPLE}\text {-}\textsf{CH}[\textsf{SE}]\) from Fig. 10 provides integrity with respect to the sample support function \(\textsf {SAMPLE\text {-}SUPP}_0\) from Fig. 12 if \(\textsf{SE}\) provides authenticated encryption. Here it is in fact sufficient for \(\textsf{SE}\) to only provide ciphertext integrity, without any assurances about the confidentiality of encrypted data. In contrast, no properties of \(\textsf{SE}\) would be sufficient for \(\textsf{CH}\) to provide integrity with respect to the sample support function \(\textsf {SAMPLE\text {-}SUPP}_1\) from Fig. 12; the construction of \(\textsf{SAMPLE}\text {-}\textsf{CH}\) itself would need to be changed to prevent replay attacks like the one displayed in Fig. 11.

Prior work on symmetric encryption formalises the intuition that a decryption oracle is useless to an adversary if all of its decryption queries can be simulated based on the live transcript of its encryption queries. This is captured as PA1 in [8] (where “PA” stands for plaintext awareness) and as decryption simulatability in [24]. An important distinction is that our definition of integrity requires \(\mathsf {\textsf{CH}.Recv}\) to behave exactly as prescribed by a specific support function, whereas the goal of [8, 24] is to draw implications from the existence of any algorithm that can simulate \(\mathsf {\textsf{CH}.Recv}\).

Fig. 14
Fig. 14

Indistinguishability of channel \(\textsf{CH}\)

3.4.3 Confidentiality

Consider the indistinguishability game \(\textrm{G}^{\textsf{ind}}_{\textsf{CH}, \mathcal {D}}\) in Fig. 14, defined for a channel \(\textsf{CH}\) and an adversary \(\mathcal {D}\). The advantage of \(\mathcal {D}\) in breaking the \(\textrm{IND}\)-security of \(\textsf{CH}\) is defined as \(\textsf{Adv}^{\textsf{ind}}_{\textsf{CH}}(\mathcal {D}) = 2 \cdot \Pr [\textrm{G}^{\textsf{ind}}_{\textsf{CH}, \mathcal {D}}] - 1\). The game samples a challenge bit b, and the adversary is required to guess it in order to win. The adversary \(\mathcal {D}\) is provided with access to a challenge oracle \(\textsc {Ch}\) and a receiving oracle \(\textsc {Recv}\). The adversary can query the challenge oracle \(\textsc {Ch}\) on inputs \(\textit{u}, m_0, m_1, \textit{aux}, r\) to obtain a ciphertext encrypting plaintext \(m_b\) with random coins \(r\) from user \(\textit{u}\) to user \(\overline{\textit{u}}\), with auxiliary information \(\textit{aux}\). Here the two plaintexts \(m_0\), \(m_1\) are required to have the same length. The adversary can query the receiving oracle \(\textsc {Recv}\) on inputs \(\textit{u}, c, \textit{aux}\) to make the user \(\textit{u}\) decrypt the incoming ciphertext c from the user \(\overline{\textit{u}}\) with auxiliary information \(\textit{aux}\). The goal of this query is to update the receiving user’s state \(\textit{st}_{\textit{u}}\); this is important because the updated state is then used to compute future outputs of queries to the challenge oracle \(\textsc {Ch}\) when user \(\textit{u}\) is the sender. The receiving oracle always discards the decrypted plaintext m and returns \(\bot \). Note that if the channel \(\textsf{CH}\) has integrity with respect to any support function \(\textsf{supp}\), then the indistinguishability adversary \(\mathcal {D}\) can itself use \(\textsf{supp}\) to compute all outputs of the receiving oracle \(\textsc {Recv}\) for either choice of the challenge bit b (i.e. at every step, \(\mathcal {D}\) knows that \(\textsf{supp}\) returns one of two possible plaintexts).

Consider the sample channel \(\textsf{CH}= \textsf{SAMPLE}\text {-}\textsf{CH}[\textsf{SE}]\) from Fig. 10 for any symmetric encryption scheme \(\textsf{SE}\) that is IND-CPA secure. Then \(\textsf{CH}\) provides indistinguishability.

3.4.4 Authenticated encryption

In Appendix B, we define the authenticated encryption security of a channel, which simultaneously captures the integrity and indistinguishability notions from above. We define the joint notion in the all-in-one style of [50, 53]. We prove that our two separate security notions together are equivalent to the authenticated encryption security. This serves as a sanity check for our definitional choices.

3.5 Message encoding schemes

We advocate for a modular approach when building cryptographic channels. At its core, a channel can be expected to have a mechanism that handles the process of encoding plaintexts into payloads and decoding payloads back into plaintexts. Such a mechanism might need to maintain counters that store the number of previously encoded and decoded messages. It might add padding to plaintexts, while possibly encoding their original lengths. It might also embed other metadata into the produced payloads. We formalise it as a separate primitive called a message encoding scheme. Then, a cryptographic channel can be built by composing a message encoding scheme with appropriate cryptographic primitives that would provide integrity and confidentiality for the encoded plaintexts.

We now formally define a message encoding scheme. The modular approach suggested above leads us to define syntax for message encoding that is similar to that of a cryptographic channel. In particular, a message encoding scheme needs to have stateful encoding and decoding algorithms. Auxiliary information can be used to relay and verify metadata such as timestamps. Note that our definition uses randomness in the encoding algorithm because it is necessary when modelling Telegram (i.e. because in MTProto 2.0 the length of padding used for payloads is randomised).

Fig. 15
Fig. 15

Syntax of message encoding scheme \(\textsf{ME}\)

Definition 4

A message encoding scheme \(\textsf{ME}\) specifies algorithms \(\mathsf {\textsf{ME}.Init}\), \(\mathsf {\textsf{ME}.Encode}\)and \(\mathsf {\textsf{ME}.Decode}\), where \(\mathsf {\textsf{ME}.Decode}\) is deterministic. Associated to \(\textsf{ME}\) is a message space \(\mathsf {\textsf{ME}.MS}\subseteq \{0,1\}^*\setminus \{\varepsilon \}\), a payload space \(\mathsf {\textsf{ME}.Out}\), a randomness space \(\mathsf {\textsf{ME}.EncRS}\) of \(\mathsf {\textsf{ME}.Encode}\), and a payload length function \(\mathsf {\textsf{ME}.pl}:{{\mathbb {N}}}\times \mathsf {\textsf{ME}.EncRS}\rightarrow {{\mathbb {N}}}\). The initialisation algorithm \(\mathsf {\textsf{ME}.Init}\) returns \(\mathcal {I}\)’s and \(\mathcal {R}\)’s initial states \(\textit{st}_{\mathcal {I}}\) and \(\textit{st}_{\mathcal {R}}\). The encoding algorithm \(\mathsf {\textsf{ME}.Encode}\) takes \(\textit{st}_{\textit{u}}\) for \(u\in \{\mathcal {I},\mathcal {R}\}\), a message \(m\in \mathsf {\textsf{ME}.MS}\), and auxiliary information \(\textit{aux}\) to return the updated state \(\textit{st}_{\textit{u}}\) and a payload \(p\in \mathsf {\textsf{ME}.Out}\).Footnote 11 We may surface random coins \(\nu \in \mathsf {\textsf{ME}.EncRS}\) as an additional input to \(\mathsf {\textsf{ME}.Encode}\); then a message m should be encoded into a payload \(p\) of length \(\left| p\right| =\mathsf {\textsf{ME}.pl}(\left| m\right| , \nu )\). The decoding algorithm \(\mathsf {\textsf{ME}.Decode}\) takes \(\textit{st}_{\textit{u}}, p\), and auxiliary information \(\textit{aux}\) to return the updated state \(\textit{st}_{\textit{u}}\) and a message \(m\in \mathsf {\textsf{ME}.MS}\cup \{\bot \}\). The syntax used for the algorithms of \(\textsf{ME}\) is given in Fig. 15.

We now define two properties of a message encoding scheme: encoding correctness and encoding integrity. We formalise each property with respect to a support function, in a similar way to how we formalised correctness and integrity for a channel in Sect. 3.4. The encoding correctness and integrity notions both roughly require that the decoding algorithm of a message encoding scheme always returns messages that are consistent with the support function. The two notions differ in that the encoding correctness only requires the outputs to be consistent until the first error occurs (i.e. until the support function returns \(\bot \)), whereas the encoding integrity also requires the decoding algorithm to recover from errors and keep returning consistent outputs throughout. We formalise both notions in the setting where the message encoding scheme is being run over an authenticated channel. This reflects the intuition that the message encoding scheme does not have to provide any cryptographic properties, but it is expected to be composed with a primitive that guarantees the integrity of communication. In contrast, the message encoding scheme itself is responsible for providing all properties that are required by a support function and are not implied by integrity. This may include the impossibility to replay, reorder and drop messages.

Fig. 16
Fig. 16

Encoding correctness and encoding integrity of message encoding scheme \(\textsf{ME}\) with respect to support function \(\textsf{supp}\). Game \(\textrm{G}^{\textsf{ecorr}}_{\textsf{ME}, \textsf{supp}, \mathcal {F}}\) includes the boxed code and game \(\textrm{G}^{\textsf{eint}}_{\textsf{ME}, \textsf{supp}, \mathcal {F}}\) does not

We use the games in Fig. 16 to formalise the encoding correctness and integrity notions of a message encoding scheme \(\textsf{ME}\) with respect to a support function \(\textsf{supp}\). The advantage of an adversary \(\mathcal {F}\) in breaking the encoding correctness of \(\textsf{ME}\) with respect to \(\textsf{supp}\) is defined as \(\textsf{Adv}^{\textsf{ecorr}}_{\textsf{ME}, \textsf{supp}}(\mathcal {F}) = \Pr [\textrm{G}^{\textsf{ecorr}}_{\textsf{ME}, \textsf{supp}, \mathcal {F}}]\). The advantage of an adversary \(\mathcal {F}\) in breaking the encoding integrity (\(\textrm{EINT}\)-security) of \(\textsf{ME}\) with respect to \(\textsf{supp}\) is defined as \(\textsf{Adv}^{\textsf{eint}}_{\textsf{ME}, \textsf{supp}}(\mathcal {F}) = \Pr [\textrm{G}^{\textsf{eint}}_{\textsf{ME}, \textsf{supp}, \mathcal {F}}]\). The encoding correctness game \(\textrm{G}^{\textsf{ecorr}}_{\textsf{ME}, \textsf{supp}, \mathcal {F}}\) contains the boxed code, while the encoding integrity game \(\textrm{G}^{\textsf{eint}}_{\textsf{ME}, \textsf{supp}, \mathcal {F}}\) does not. The encoding correctness requires that \(\textsf{ME}\) manages to “correctly” decode all payloads that are deemed to be admissible by the support function \(\textsf{supp}\), while the inadmissible payloads are ignored by the game; here the support function itself is used to determine what constitutes a “correct” decoding. The encoding integrity requires that \(\textsf{ME}\) rejects inadmissible payloads while maintaining its baseline correctness; this in particular means that the processing of inadmissible payloads should not corrupt the state of \(\textsf{ME}\) in unexpected ways. As a result of processing inadmissible payloads, the receiver’s transcript will contain \((\textsf{recv}, \bot , p, \textit{aux})\)-type entries. The support function \(\textsf{supp}\) might process various conditions involving these entries (e.g. depending on the number of errors that occurred), and the encoding scheme \(\textsf{ME}\) still has to provide outputs that are consistent with \(\textsf{supp}\).

The two core differences from the corresponding channel notions in Sect. 3.4 are as follows. First, the message encoding scheme is meant to be run within an integrity-protected communication channel, so the \(\textsc {Recv}\) oracle in both games now starts by checking that the queried payload \(p\) was returned by a prior call to the opposite user’s \(\textsc {Send}\) oracle (in response to some message m and auxiliary information \(\textit{aux}\)). Second, the message encoding is not meant to serve any cryptographic purpose, so the initial states \(\textit{st}_{\textsf{ME}, \mathcal {I}}, \textit{st}_{\textsf{ME}, \mathcal {R}}\) should not contain any secret information and are given as inputs to adversary \(\mathcal {F}\) in both games. This means that the encoding integrity is a strictly stronger notion than the encoding correctness, and the latter has limited value.Footnote 12

In Sect. 5.3, we define three more properties of message encoding that will be necessary for our security analysis of MTProto 2.0. None of these properties are defined with respect to a support function. Our modular approach of building a channel from a message encoding scheme serves to localise the number of times we need to consider the specifics of a support function: the integrity proof (in Sect. 5.6) of the channel that we study is reduced to the encoding integrity of the underlying message encoding scheme, and the latter is then proved in Appendix E.5.

4 MTProto 2.0 specification

In this section, we describe our modelling of the MTProto 2.0 record protocol as a bidirectional channel. First, in Sect. 4.1 we give an informal description of MTProto based on Telegram documentation and client implementations. Next, in Sect. 4.2 we outline attacks that motivate protocol changes required to achieve security. We list further modelling issues and points where we depart from Telegram documentation in Sect. 4.3. We conclude with Sect. 4.4 where we give our formal specification for a fixed version of the protocol.

4.1 Telegram description

We studied MTProto 2.0 as described in the online documentation [62] and as implemented in the official desktopFootnote 13 and AndroidFootnote 14 clients. We focus on cloud chats, i.e. chats that are only encrypted at the transport layer between the clients and Telegram servers. The end-to-end encrypted secret chats are implemented on top of this transport layer and only available for one-on-one chats. Figures 17 and 18 give a visual summary of the following description.

Fig. 17
Fig. 17

Parsing \(\textsf {auth}\_\textsf {key} \) in MTProto 2.0. User \(\textit{u}\in \{\mathcal {I}, \mathcal {R}\}\) derives a \(\textsf{KDF}\) key \(\textit{kk}_\textit{u}= (\textit{kk}_{\textit{u}, 0}, \textit{kk}_{\textit{u}, 1})\) and a \(\textsf{MAC}\) key \(\textit{mk}_\textit{u}\)

Fig. 18
Fig. 18

Overview of message processing in MTProto 2.0

4.1.1 Key exchange

A Telegram client must first establish a symmetric 2048-bit auth_key with the server via a version of the Diffie–Hellman key exchange. We defer the details of the key exchange to Sect. 7. In practice, this key exchange first results in a permanent auth_key for each of the Telegram data centres the client connects to. Thereafter, the client runs a new key exchange on a daily basis to establish a temporary auth_key that is used instead of the permanent one.

4.1.2 “Record protocol”

Messages are protected as follows.

  1. 1.

    API calls are expressed as functions in the TL schema [57].

  2. 2.

    The API requests and responses are serialised according to the type language (TL) [59] and embedded in the msg_data field of a payload \(p\) , shown in Table 1. The first two 128-bit blocks of \(p\) have a fixed structure and contain various metadata. The maximum length of msg_data is \(2^{24}\) bytes.

  3. 3.

    The payload is encrypted using \(\textsf{AES}-\textsf{256}\)in IGE mode. The ciphertext c is a part of an MTProto ciphertext \(\textsf {auth}\_\textsf {key}\_\textsf {id} ~\Vert ~\textsf {msg}\_\textsf {key} ~\Vert ~c\), where (recalling that z[a : b] denotes bits a to \(b-1\), inclusive, of string z):

    $$\begin{aligned} \textsf {auth}\_\textsf {key}\_\textsf {id}&:=\textsf{SHA}-\textsf{1}[\textsf {auth}\_\textsf {key} ][96:160]\\ \textsf {msg}\_\textsf {key}&:=\textsf{SHA}-\textsf{256}[\textsf {auth}\_\textsf {key} {[704+x:960+x]} ~\Vert ~p ][64:192]\\ c&:=\mathsf {IGE[AES}-\mathsf {256]}.\textsf{Enc}(\textsf{key} ~\Vert ~\textsf{iv}, p) \end{aligned}$$

    Here, the first two fields form an external header. The \(\mathsf {IGE[AES}-\mathsf {256]}\) keys and IVs are computed via:

    $$\begin{aligned} A&:=\textsf{SHA}-\textsf{256}[\textsf {msg}\_\textsf {key} ~\Vert ~\textsf {auth}\_\textsf {key} {[x:288 + x]}]\\ B&:=\textsf{SHA}-\textsf{256}[\textsf {auth}\_\textsf {key} {[320 + x:608 + x]} ~\Vert ~\textsf {msg}\_\textsf {key} ]\\ \textsf{key}&:=A[0:64] ~\Vert ~B[64:192] ~\Vert ~A[192:256]\\ \textsf{iv}&:=B[0:64] ~\Vert ~A[64:192] ~\Vert ~B[192:256] \end{aligned}$$

    In the above steps, \(x=0\) for messages from the client and \(x=64\) from the server. Telegram clients use the BoringSSL implementation [29] of IGE, which has 2-block IVs.

  4. 4.

    MTProto ciphertexts are encapsulated in a “transport protocol”. The MTProto documentation defines multiple such protocols [55], but the default is the abridged format that prefixes the stream with a fixed value of 0xefefefef and afterwards wraps each MTProto ciphertext \(c_{\textsf{MTP}}\) in a transport packet as:

    • \(\textsf {length} ~\Vert ~c_{\textsf{MTP}}\) where 1-byte length contains the \(c_{\textsf{MTP}}\) length divided by 4, if the resulting packet length is \(< 127\), or

    • \(\texttt {0x7f} ~\Vert ~\textsf {length} ~\Vert ~c_{\textsf{MTP}}\) where length is encoded in 3 bytes.

  5. 5.

    All the resulting packets are obfuscated by default using \(\textsf{AES}-\textsf{128}\,\)in CTR mode. The key and IV are transmitted at the beginning of the stream, so the obfuscation provides no cryptographic protection and we ignore it henceforth.Footnote 15

  6. 6.

    Communication is over TCP (port 443) or HTTP. Clients attempt to choose the best available connection. There is support for TLS in the client code, but it does not seem to be used.

In combination, these operations mean that MTProto 2.0 at its core uses a “stateful Encrypt & MAC” construction. Here the MAC tag \(\textsf {msg}\_\textsf {key} \) is computed using \(\textsf{SHA}-\textsf{256}\) with a prepended key derived from (certain bits of) auth_key . The key and IV for IGE mode are derived on a per-message basis using a KDF based on \(\textsf{SHA}-\textsf{256}\), using certain bits of auth_key as the KDF key and the \(\textsf {msg}\_\textsf {key} \) as a diversifier. Note that the bit ranges of auth_key used by the client and the server to derive keys in both operations overlap with one another. Any formal security analysis needs to take this into account.

Table 1 MTProto payload format

4.2 Attacks against MTProto metadata validation

We describe adversarial behaviours that are permitted in current Telegram implementations and that mostly depend on how clients and servers validate metadata information in the payload (especially the second 128-bit block containing msg_id , msg_seq_no and msg_length ).

We consider a network attacker that sits between the client and the Telegram servers, attempting to manipulate the conversation transcript. We distinguish between two cases: when the client is the sender of a message and when it is the receiver. By message, we mean any msg_data exchanged via MTProto, but we pay particular attention to when it contains a chat message.

4.2.1 Message reordering

By reordering we mean that an adversary can swap messages sent by one party so that they are processed in the wrong order by the receiving party. Preventing such attacks is a basic property that one would expect in a secure messaging protocol. The MTProto documentation mentions reordering attacks as something to protect against in secret chats but does not discuss it for cloud chats [65]. The implementation of cloud chats provides some protection, but not fully:

  • When the client is the receiver, the order of displayed chat messages is determined by the date and time values within the TL message object (which are set by the server), so adversarial reordering of packets has no effect on the order of chat messages as seen by the client. On mobile clients, messages are also delivered via push notification systems, which are typically secured with TLS. Note that service messages of MTProto typically do not have such a timestamp so reordering is theoretically possible, but it is unclear whether it would affect the client’s state since such messages tend to be responses to particular requests or notices of errors, which are not expected to arrive in a given order.

  • When the client is the sender, the order of chat messages can be manipulated because the server sets the date and time value for the Telegram user to whom the message was addressed based on when the server itself receives the message, and because the server will accept a message with a lower msg_id than that of a previous message as long as its msg_seq_no is also lower than that of a previous message. The server does not take the timestamp implicit within msg_id into account except to check whether it is at most 300 s in the past or 30 s in the future, so within this time interval reordering is possible. A message outside of this time interval is not ignored, but a request for time synchronisation is triggered, after receipt of which the client sends the message again with a fresh msg_id. So an attacker can also simply delay a chosen message to cause messages to be accepted out of order. In Telegram, the rotation of the server_salt every 30 to 60 min may be an obstacle to carrying out this attack in longer time intervals.

We verified that reordering between a sending client and a receiving server is possible in practice using unmodified Android clients (v6.2.0) and a malicious WiFi access point running a TCP proxy [42] with custom rules to suppress and later release certain packets. Suppose an attacker sits between Alice and a server, and Alice is in a chat with Bob. The attacker can reorder messages that Alice is sending, so the server receives them in the wrong order and forwards them in the wrong order to Bob. While Alice’s client will initially display her sent messages in the order she sent them, once it fetches history from the server it will update to display the modified order that will match that of Bob.

Note that such reordering attacks are not possible against e.g. Signal or MTProto’s closest “competitor” TLS. TLS-like protocols over UDP such as DTLS [48] or QUIC [32] either leave it to the application to handle packet reordering (reordering is possible against DTLS) or have built-in mechanisms to handle these (reordering is not possible against QUIC).

Other types of reordering. A stronger form of reordering resistance can also be required from a protocol if one considers the order in the transcript as a whole, so that the order of sent messages with respect to received messages has to be preserved. This is sometimes referred to as global transcript in the literature [74] and is generally considered to be more complex to achieve. In particular, the following is possible in both Telegram and e.g. Signal. Alice sends a message “Let’s commit all the crimes”. Then, simultaneously both Alice and Bob send a message. Alice: “Just kidding”; Bob: “Okay”. Depending on the order in which these messages arrive, the transcript on either side might be (Alice: “Let’s commit all the crimes”, Alice: “Just kidding”, Bob: “Okay”) or (Alice: “Let’s commit all the crimes”, Bob: “Okay”, Alice: “Just kidding”). That is, the transcript will have Bob acknowledging a joke or criminal activity. Note that in the context of group messaging, there is another related but weaker property: the notion of causality preservation [26]. However, when restricted to the two-party case, this property becomes equivalent to in-order delivery (as exhibited by the support function \(\textsf{supp}\text {-}\textsf{ord}\) defined in Fig. 32).

4.2.2 Message drops

MTProto makes it possible to silently drop a message both when the client is the sender.Footnote 16 and when it is the receiver, but it is difficult to exploit in practice. Clients and the server attempt to resend messages for which they did not get acknowledgements. Such messages have the same msg_ids but are enclosed in a fresh ciphertext with random padding so the attacker must be able to distinguish the repeated encryptions to continue dropping the same payload. This is possible, for example, with the desktop client as sender, since padding length is predictable based on the message length [69]. When the client is a receiver, other message delivery mechanisms such as batching of messages inside a container or API calls like messages.getHistory make it hard for an attacker to identify repeated encryptions. In the latter case, MTProto does not prevent message drops, but there is likely no practical attack.

4.2.3 Re-encryption

If a message is not acknowledged within a certain time in MTProto, it is re-encrypted using the same msg_id and with fresh random padding. While this appears to be a useful feature and a mitigation against message drops, it breaks the expected guarantees provided by a secure channel.

The issue can be illustrated by considering a local passive adversary that captures a transcript \((c_{\mathcal {I}, 0}, c_{\mathcal {R}}, c_{\mathcal {I}, 1})\) of messages exchanged between the client and the server, where \(c_{\mathcal {I}, 0}, c_{\mathcal {I}, 1}\) were sent by the client and \(c_{\mathcal {R}}\) was sent by the server. This adversary should not be able to find any distinguishing information about the plaintexts by studying the transcript; this is a very basic security guarantee of the channel, covered under the IND-CPA setting that we also formalise in Sect. 3.4. However, re-encryptions in MTProto are distinguishable: by examining the ciphertexts, the adversary can determine whether \(c_{\mathcal {R}}\) encrypts an automatically generated acknowledgement, or a new message from the server.Footnote 17

In more detail, re-encryption means the same partial state in the form of msg_id and msg_seq_no is used for two different encryptions. A reuse of a complete state would mean the ciphertexts \(c_{\mathcal {I}, 0}, c_{\mathcal {I}, 1}\) contain the same \(\textsf {msg}\_\textsf {key} \), and further that \(c_{\mathcal {I}, 0}^{(2)} = c_{\mathcal {I},1}^{(2)}\), i.e. that the 2nd blocks of the respective ciphertexts match. These conditions are easy to check for the adversary. In a model where the adversary controls the randomness in the protocol (as in Sect. 3.4), three encryption queries would be sufficient to perform the attack. However, in practice there is one part of the state that does change upon re-encryption and that is the padding, which is also part of the input used to compute \(\textsf {msg}\_\textsf {key} \). This means that to trigger the distinguishing condition, we must rely on collisions in \(\textsf {msg}\_\textsf {key} \). Since msg_key is computed via \(\textsf{SHA}-\textsf{256}\) truncated to 128 bits and the birthday bound applies, we expect a collision with constant probability after \(3 \cdot 2^{64}\) encryption queries. This makes the attack mainly of theoretical interest.

To allow a security proof to go through, the cleanest solution is to remove the re-encryption capability from the specification of the channel, and leave the implementation of such a feature to the application layer. If a message resend facility is needed, it can be done transparently to and independently of the channel operation, i.e. each resending would take place using an updated, unique state of the channel.

4.3 Modelling differences

In general, we would like our formal specification of MTProto 2.0 to stay as close as possible to the real protocol, so that when we prove statements about the specification, we obtain meaningful assurances about the security of the real protocol. However, as the previous section demonstrates, the current protocol has flaws. These prevent meaningful security analysis and can be removed by making small changes to the protocol’s handling of metadata. Further, the protocol has certain features that make it less amenable to formal analysis. Here we describe the modelling decisions we took that depart from the current version of MTProto 2.0 and justify each change.

4.3.1 Under-specification and inconsistencies

There is no authoritative specification of the protocol. The Telegram documentation often differs from the implementations and leaves room for multiple interpretations; thus, the clients are not consistent with each other.Footnote 18 Where possible, we chose a sensible “default” choice from the observed set of possibilities, but we stress that it is in general impossible to create a formal specification of MTProto that would be valid for all current implementations. For instance, the documentation defines server_salt as “A (random) 64-bit number periodically (say, every 24 h) changed (separately for each session) at the request of the server” [63]. In practice, the clients receive salts that change every hour and whose validity periods overlap with each other.Footnote 19 For client differences, consider padding generation: on desktop [69], a given message length will always result in the same padding length, whereas on Android [67], the padding length is randomised.

4.3.2 Application layer

Similarly, there is no clear separation between the cryptographic protocol of MTProto and the application data processing (expressed using the TL schema). However, to reason succinctly about the protocol we require a certain level of abstraction. In concrete terms, this means that we consider the msg_data field as “the message”, without interpreting its contents and in particular without modelling TL constructors. However, this separation does not exist in implementations of MTProto—for instance, message encoding behaves differently for some constructors (e.g. container messages)—and so our specification does not capture these details.

4.3.3 Client/server roles

The client and the server are not considered equal in MTProto. For instance, the server is trusted to timestamp TL messages for history, while the clients are not, which is why our reordering attacks only work in the client to server direction. The client chooses the session_id , the server generates the server_salt . The server accepts any session_id given in the first message and then expects that value, while the client checks the session_id but may accept any server_salt given,Footnote 20 Clients do not check the msg_seq_no field. The protocol implements elaborate measures to synchronise “bad” client time with server time, which includes: checks on the timestamp within msg_id as well as the salt, special service messages [56] and the resending of messages with regenerated headers. Since much of this behaviour is not critical for security, we model both parties of the protocol as equals. Expanding our specification with this behaviour should be possible without affecting most of the proofs.

4.3.4 Key exchange

We are concerned with the symmetric part of the protocol, and thus assume that the shared auth_key is a uniformly random string rather than of the form \(g^{ab} \bmod p\) resulting from the actual key exchange.

4.3.5 Bit mixing

MTProto uses specific bit ranges of auth_key as KDF and MAC inputs. These ranges do not overlap for different primitives (i.e. the KDF key inputs are wholly distinct from the MAC key inputs), and we model auth_key as a random value, so without loss of generality our specification generates the KDF and MAC key inputs as separate random values. The key input ranges for the client and the server do overlap for KDF and MAC separately, however, so we model this in the form of related-key-deriving functions.

Further, the KDF intermixes specific bit ranges of the outputs of two \(\textsf{SHA}-\textsf{256}\) calls to derive the encryption keys and IVs. We argue that this is unnecessary—the intermixed KDF output is indistinguishable from random (the usual security requirement of a key derivation function) if and only if the concatenation of the two \(\textsf{SHA}-\textsf{256}\) outputs is indistinguishable from random. Hence, in our specification the KDF just returns the concatenation.

4.3.6 Order

Given that MTProto operates over reliable transport channels, it is not necessary to allow messages arriving out of order. Our specification imposes stricter validation on metadata upon decryption via a single sequence number that is checked by both sides and only the next expected value is accepted. Enforcing strict ordering also automatically rules out message replay and drop attacks, which the implementation of MTProto as studied avoided in some cases only due to application-level processing.Footnote 21

4.3.7 Re-encryption

Because of the attacks in Sect. 4.2, we insist in our formalisation that all sent messages include a fresh value in the header. This is achieved via a stateful secure channel definition in which either a client or server sequence number is incremented on each call to the \(\mathsf {\textsf{CH}.Send}\) oracle.

4.3.8 Message encoding

Some of the previous points outline changes to message encoding. We simplify the scheme, keeping to the format of Table 1 but not modelling diverging behaviours upon decoding. The implemented MTProto message encoding scheme behaves differently depending on whether the user is a client or a server, but each of them checks a 64-bit value in the first plaintext block, session_id and server_salt , respectively. To prove security of the channel, it is enough that there is a single such value that both parties check, and it does not need to be randomised, so we specify a constant \(\textsf {session}\_\textsf {id}\) and we leave the salt as an empty field. We also merge the msg_id and msg_seq_no fields into a single sequence number field of corresponding size, reflecting that a simple counter suffices in place of the original fields. Note that though we only prove security with respect to this particular message encoding scheme, our approach to specification is flexible and can accommodate more complex message encoding schemes.

4.4 MTProto-based channel

Our specification of the MTProto channel is given in Definition 5 and Fig. 19. The users \(\mathcal {I}\) and \(\mathcal {R}\) represent the client and the server. We abstract the individual keyed primitives into function families and instantiate each primitive or function later in this section.Footnote 22

Definition 5

Let \(\textsf{ME}\) be a message encoding scheme. Let \(\textsf{HASH}\) be a function family such that \(\{0,1\}^{992} \subseteq \textsf{HASH}.\textsf{IN}\). Let \(\textsf{MAC}\) be a function family such that \(\mathsf {\textsf{ME}.Out}\subseteq \textsf{MAC}.\textsf{IN}\). Let \(\textsf{KDF}\) be a function family such that \(\{0,1\}^{\textsf{MAC}.\textsf{ol}} \subseteq \textsf{KDF}.\textsf{IN}\). Let \(\phi _{\textsf{MAC}} : \{0,1\}^{320} \rightarrow \textsf{MAC}.\textsf{KS} \times \textsf{MAC}.\textsf{KS}\) and \(\phi _{\textsf{KDF}} :\{0,1\}^{672} \rightarrow \textsf{KDF}.\textsf{KS} \times \textsf{KDF}.\textsf{KS}\). Let \(\textsf{SE}\) be a deterministic symmetric encryption scheme with \(\mathsf {\textsf{SE}.kl}= \textsf{KDF}.\textsf{ol}\) and \(\mathsf {\textsf{SE}.MS}= \mathsf {\textsf{ME}.Out}\). Then, \(\textsf{CH}= \textsf{MTP}\text {-}\textsf{CH} [\textsf{ME}, \textsf{HASH}, \textsf{MAC}, \textsf{KDF}, \phi _{\textsf{MAC}}, \phi _{\textsf{KDF}}, \textsf{SE}]\) is the channel as defined in Fig. 19, with \(\mathsf {\textsf{CH}.MS}= \mathsf {\textsf{ME}.MS}\) and \(\mathsf {\textsf{CH}.SendRS}= \mathsf {\textsf{ME}.EncRS}\).

Fig. 19
Fig. 19

Construction of MTProto-based channel \(\textsf{CH}= \textsf{MTP}\text {-}\textsf{CH} [\textsf{ME}, \textsf{HASH}, \textsf{MAC}, \textsf{KDF}, \phi _{\textsf{MAC}}, \phi _{\textsf{KDF}}, \textsf{SE}]\) from message encoding scheme \(\textsf{ME}\), function families \(\textsf{HASH}\), \(\textsf{MAC}\) and \(\textsf{KDF}\), related-key-deriving functions \(\phi _{\textsf{MAC}}\) and \(\phi _{\textsf{KDF}}\), and from deterministic symmetric encryption scheme \(\textsf{SE}\)

\(\mathsf {\textsf{CH}.Init}\) generates the keys for both users and initialises the message encoding scheme. Note that \(\textsf{auth}{\_}\textsf{key}\) as described in Sect. 4.1 does not appear in the code in Fig. 19, since each part of \(\textsf{auth}{\_}\textsf{key}\) that is used for keying the primitives can be generated independently. These parts are denoted by \(\textit{hk}\), \(\textit{kk}\) and \(\textit{mk}\).Footnote 23 The function \(\phi _{\textsf{KDF}}\) (resp. \(\phi _{\textsf{MAC}}\)) is then used to derive the (related) keys for each user from \(\textit{kk}\) (resp. \(\textit{mk}\)).

\(\mathsf {\textsf{CH}.Send}\) proceeds by first using \(\textsf{ME}\) to encode a message m into a payload \(p\). The \(\textsf{MAC}\) is computed on this payload to produce a \(\textsf{msg}{\_}\textsf{key}\), and the \(\textsf{KDF}\) is called on the \(\textsf{msg}{\_}\textsf{key}\) to compute the key and IV for symmetric encryption \(\textsf{SE}\), here abstracted as k. The payload is encrypted with \(\textsf{SE}\) using this key material, and the resulting ciphertext is called \(c_{\textit{se}}\). The \(\textsf{CH}\) ciphertext c consists of \(\textsf{auth}{\_}\textsf{key}{\_}\textsf{id}\), \(\textsf{msg}{\_}\textsf{key}\) and the symmetric ciphertext \(c_{\textit{se}}\).

\(\mathsf {\textsf{CH}.Recv}\) reverses the steps by first computing k from the \(\textsf{msg}{\_}\textsf{key}\) parsed from c, then decrypting \(c_{\textit{se}}\) to the payload \(p\), and recomputing the \(\textsf{MAC}\) of \(p\) to check whether it equals \(\textsf{msg}{\_}\textsf{key}\). If not, it returns \(\bot \) (without changing the state) to signify failure. If the check passes, it uses \(\textsf{ME}\) to decode the payload into a message m. It is important the \(\textsf{MAC}\) check is performed before \(\mathsf {\textsf{ME}.Decode}\) is called, otherwise this opens the channel to attacks—as we show later in Sect. 6.

The message encoding scheme \(\textsf{MTP}\text {-}\textsf{ME}\) is specified in Definition 6 and Fig. 20. It is a simplified scheme for in-order message delivery without replays (see Appendix D for the actual MTProto scheme that permits reordering as outlined in Sect. 4.2).

Definition 6

Let \(\textsf {session}\_\textsf {id}\in \{0,1\}^{64}\) and let \(\textsf {pb}, \textsf{bl}\in {{\mathbb {N}}}\). Denote by \(\textsf{ME}\) \(=\) \(\textsf{MTP}\text {-}\textsf{ME}[\textsf {session}\_\textsf {id}\), \(\textsf {pb} \), \(\textsf{bl}]\) the message encoding scheme given in Fig. 20, with \(\mathsf {\textsf{ME}.MS}= \bigcup _{i = 1}^{2^{24}} \{0,1\}^{8\cdot i}\), \(\mathsf {\textsf{ME}.Out}= \bigcup _{i \in {{\mathbb {N}}}} \{0,1\}^{\textsf{bl}\cdot i}\) and \(\mathsf {\textsf{ME}.pl}(\ell , \nu ) = 256 + \ell + \left| \textsf{GenPadding} (\ell ; \nu )\right| \).Footnote 24

Fig. 20
Fig. 20

Construction of simplified message encoding scheme for in-order message delivery \(\textsf{ME}= \textsf{MTP}\text {-}\textsf{ME}[\textsf {session}\_\textsf {id}, \textsf {pb}, \textsf{bl}]\) for session identifier \(\textsf {session}\_\textsf {id}\), maximum padding length (in full blocks) \(\textsf {pb} \), and output block length \(\textsf{bl}\)

As justified in Sect. 4.3, \(\textsf{MTP}\text {-}\textsf{ME}\) follows the header format of Table 1, but it does not use the \(\textsf {server}\_\textsf {salt} \) field (we define \(\textsf {salt} \) as filled with zeros to preserve the field order) and we merge the 64-bit \(\textsf {msg}\_\textsf {id} \) and 32-bit \(\textsf {msg}\_\textsf {seq}\_\textsf {no} \) fields into a single \(96\)-bit \(\textsf {seq}\_\textsf {no} \) field. Note that the internal counters of \(\textsf{MTP}\text {-}\textsf{ME}\) wrap around when \(\textsf {seq}\_\textsf {no} \) “overflows” modulo \(2^{96}\), and an attacker can start replaying old payloads as soon as this happens. So when proving the encoding integrity of \(\textsf{MTP}\text {-}\textsf{ME}\) in Appendix E.5 with respect to a support function that prohibits replays, we will consider adversaries that make at most \(2^{96}\) message encoding queries.Footnote 25

The following \(\textsf{SHA}-\textsf{1}\) and \(\textsf{SHA}-\textsf{256}\)-based function families capture the MTProto primitives that are used to derive \(\textsf{auth}{\_}\textsf{key}{\_}\textsf{id}\), the message key \(\textsf{msg}{\_}\textsf{key}\) and the symmetric encryption key k.

Definition 7

\(\textsf{MTP}\text {-}\textsf{HASH}\) is the function family defined by \( \textsf{MTP}\text {-}\textsf{HASH}.\textsf{KS}\) \(=\) \(\{0,1\}^{1056}\), \(\textsf{MTP}\text {-}\textsf{HASH}.\textsf{IN}\) \(=\) \(\{0,1\}^{992}\), \(\textsf{MTP}\text {-}\textsf{HASH}.\textsf{ol} = 128\) and \(\textsf{MTP}\text {-}\textsf{HASH}.\textsf{Ev}\) given in Fig. 21.

Fig. 21
Fig. 21

Construction of function family \(\textsf{MTP}\text {-}\textsf{HASH}\)

Definition 8

\(\textsf{MTP}\text {-}\textsf{MAC}\) is the function family defined by \(\textsf{MTP}\text {-}\textsf{MAC}.\textsf{KS}\) \(=\) \(\{0,1\}^{256}\), \(\textsf{MTP}\text {-}\textsf{MAC}.\textsf{IN}\) \(=\) \(\{0,1\}^*\), \(\textsf{MTP}\text {-}\textsf{MAC}.\textsf{ol}=128\) and \(\textsf{MTP}\text {-}\textsf{MAC}.\textsf{Ev}\) given in Fig. 22.

Fig. 22
Fig. 22

Construction of function family \(\textsf{MTP}\text {-}\textsf{MAC}\)

Definition 9

\(\textsf{MTP}\text {-}\textsf{KDF}\) is the function family defined by \(\textsf{MTP}\text {-}\textsf{KDF}.\textsf{KS}\) \(=\) \(\{0,1\}^{288} \times \{0,1\}^{288}\), \(\textsf{MTP}\text {-}\textsf{KDF}.\textsf{IN}\) \(=\) \(\{0,1\}^{128}\), \(\textsf{MTP}\text {-}\textsf{KDF}.\textsf{ol}\) \(=\) \(2 \cdot \textsf{SHA}-\textsf{256}.\textsf{ol}\) and \(\textsf{MTP}\text {-}\textsf{KDF}.\textsf{Ev}\) given in Fig. 23.

Fig. 23
Fig. 23

Construction of function family \(\textsf{MTP}\text {-}\textsf{KDF}\)

Since the keys for \(\textsf{KDF}\) and \(\textsf{MAC}\) in MTProto are not independent for the two users, we have to work in a related-key setting. We are inspired by the RKA framework of [14], but define our related-key-deriving function \(\phi _{\textsf{KDF}}\) (resp. \(\phi _{\textsf{MAC}}\)) to output both keys at once, as a function of \(\textit{kk}\) (resp. \(\textit{mk}\)). See Fig. 24 for precise details of \(\phi _{\textsf{KDF}}\) and \(\phi _{\textsf{MAC}}\).

Fig. 24
Fig. 24

Related-key-deriving functions \(\phi _{\textsf{KDF}} :\{0,1\}^{672} \rightarrow \textsf{MTP}\text {-}\textsf{KDF}.\textsf{KS} \times \textsf{MTP}\text {-}\textsf{KDF}.\textsf{KS}\) and \(\phi _{\textsf{MAC}} :\{0,1\}^{320} \rightarrow \textsf{MTP}\text {-}\textsf{MAC}.\textsf{KS} \times \textsf{MTP}\text {-}\textsf{MAC}.\textsf{KS}\)

Finally, we define the deterministic symmetric encryption scheme.

Definition 10

Let \(\textsf{AES}-\textsf{256}\) be the standard AES block cipher with \(\textsf{AES}-\textsf{256}.\textsf{kl} = 256\) and \(\textsf{AES}-\textsf{256}.\textsf{ol}\) \(=\) 128, and let \(\textsf{IGE}\) be the block cipher mode in Fig. 5. Let \(\textsf{MTP}\text {-}\textsf{SE}= \textsf{IGE}[\textsf{AES}-\textsf{256}]\).

5 Formal security analysis

In this section, we define the security notions that we require to hold for each of the underlying primitives of \(\textsf{MTP}\text {-}\textsf{CH} \) and then use these notions to justify its correctness and prove its security properties.

We start by defining the security notions we require from the standard primitives in Sect. 5.1 (i.e. from the MTProto-based instantiations of \(\textsf{HASH}\), \(\textsf{KDF}\), \(\textsf{MAC}\), \(\textsf{SE}\)); in Sect. 5.2, we then define two novel assumptions about \(\textsf{SHACAL}-\textsf{2}\) that will be used in Appendix E to justify some of the aforementioned security notions. In Sect. 5.3, we define the security notions that will be required from the MTProto-based message encoding scheme; these notions are likewise justified in Appendix E. We prove that channel \(\textsf{MTP}\text {-}\textsf{CH} \) satisfies correctness, indistinguishability and integrity in Sections 5.4,5.5 and 5.6, respectively. We conclude by providing an interpretation of our formal results in Sect. 5.7.

Our proofs use games and hops between them. In our games, we annotate some lines with comments of the form “\(\textrm{G}_i\)\(\textrm{G}_j\)” to indicate that these lines belong only to games \(\textrm{G}_i\) through \(\textrm{G}_j\) (inclusive). The lines not annotated with such comments are shared by all of the games that are shown in the particular figure.

5.1 Security requirements on standard primitives

5.1.1 \(\textsf{MTP}\text {-}\textsf{HASH}\) is a one-time indistinguishable function family

We require that \(\textsf{MTP}\text {-}\textsf{HASH}\) meets the one-time weak indistinguishability notion (\(\textrm{OTWIND}\)) defined in Fig. 25. The security game \(\textrm{G}^\textsf{otwind}_{\textsf{HASH}, \mathcal {D}}\) in Fig. 25 evaluates the function family \(\textsf{HASH}\) on a challenge input \(x_b\) using a secret uniformly random function key \(\textit{hk}\). Adversary \(\mathcal {D}\) is given \(x_0, x_1\) and the output of \(\textsf{HASH}\); it is required to guess the challenge bit \(b\in \{0,1\}\). The game samples inputs \(x_0, x_1\) uniformly at random rather than allowing \(\mathcal {D}\) to choose them, so this security notion requires \(\textsf{HASH}\) to provide only a weak form of one-time indistinguishability. The advantage of \(\mathcal {D}\) in breaking the \(\textrm{OTWIND}\)-security of \(\textsf{HASH}\) is defined as \(\textsf{Adv}^{\textsf{otwind}}_{\textsf{HASH}}(\mathcal {D}) = 2 \cdot \Pr [\textrm{G}^\textsf{otwind}_{\textsf{HASH}, \mathcal {D}}] - 1\). Appendix E.1 provides a formal reduction from the \(\textrm{OTWIND}\)-security of \(\textsf{MTP}\text {-}\textsf{HASH}\) to the one-time PRF security of \(\textsf{SHACAL}-\textsf{1}\) (as defined in Sect. 2.2).

Fig. 25
Fig. 25

One-time weak indistinguishability of function family \(\textsf{HASH}\)

5.1.2 \(\textsf{MTP}\text {-}\textsf{KDF}\) is a PRF under related-key attacks

We require that \(\textsf{MTP}\text {-}\textsf{KDF}\) behaves like a pseudorandom function in the RKA setting (\(\textrm{RKPRF}\)) as defined in Fig. 26. The security game \(\textrm{G}^\textsf{rkprf}_{\textsf{KDF}, \phi _{\textsf{KDF}}, \mathcal {D}}\) in Fig. 26 defines a variant of the standard PRF notion allowing the adversary \(\mathcal {D}\) to use its \(\textsc {RoR}\) oracle to evaluate the function family \(\textsf{KDF}\) on either of the two secret, related function keys \(\textit{kk}_\mathcal {I}, \textit{kk}_\mathcal {R}\) (both computed using related-key-deriving function \(\phi _{\textsf{KDF}}\)). The advantage of \(\mathcal {D}\) in breaking the \(\textrm{RKPRF}\)-security of \(\textsf{KDF}\) with respect to \(\phi _{\textsf{KDF}}\) is defined as \(\textsf{Adv}^{\textsf{rkprf}}_{\textsf{KDF}, \phi _{\textsf{KDF}}}(\mathcal {D}) = 2 \cdot \Pr [\textrm{G}^\textsf{rkprf}_{\textsf{KDF}, \phi _{\textsf{KDF}}, \mathcal {D}}] - 1\).

Fig. 26
Fig. 26

Related-key PRF security of function family \(\textsf{KDF}\) with respect to related-key-deriving function \(\phi _{\textsf{KDF}}\)

In Sect. 5.2, we define a novel security notion for \(\textsf{SHACAL}-\textsf{2}\) that roughly requires it to be a leakage-resilient PRF under related-key attacks; in Appendix E.2, we provide a formal reduction from the \(\textrm{RKPRF}\)-security of \(\textsf{MTP}\text {-}\textsf{KDF}\) to the new security notion. In this context, “leakage resilience” means that the adversary can adaptively choose a part of the \(\textsf{SHACAL}-\textsf{2}\) key. However, we limit the adversary to being able to evaluate \(\textsf{SHACAL}-\textsf{2}\) only on a single known, constant input (which is \(\textsf{IV}_{256}\), the initial state of \(\textsf{SHA}-\textsf{256}\)). The new security notion is formalised as the \(\textrm{LRKPRF}\)-security of \(\textsf{SHACAL}-\textsf{2}\) with respect to a pair of related-key-deriving functions \(\phi _{\textsf{KDF}}\) and \(\phi _{\textsf{SHACAL}-\textsf{2}}\) (the latter is defined in Sect. 5.2).

5.1.3 \(\textsf{MTP}\text {-}\textsf{MAC}\) is collision-resistant under RKA

We require that collisions in the outputs of \(\textsf{MTP}\text {-}\textsf{MAC}\) under related keys are hard to find (\(\textrm{RKCR}\)), as defined in Fig. 27. The security game \(\textrm{G}^{\textsf{rkcr}}_{\textsf{MAC}, \phi _{\textsf{MAC}}, \mathcal {F}}\) in Fig. 27 gives the adversary \(\mathcal {F}\) two related function keys \(\textit{mk}_\mathcal {I}, \textit{mk}_\mathcal {R}\) (created by the related-key-deriving function \(\phi _{\textsf{MAC}}\)), and requires it to produce two payloads \(p_0, p_1\) (for either user \(\textit{u}\)) such that there is a collision in the corresponding outputs \(\textsf{msg}{\_}\textsf{key}_0, \textsf{msg}{\_}\textsf{key}_1\) of the function family \(\textsf{MAC}\). The advantage of \(\mathcal {F}\) in breaking the \(\textrm{RKCR}\)-security of \(\textsf{MAC}\) with respect to \(\phi _{\textsf{MAC}}\) is defined as \(\textsf{Adv}^{\textsf{rkcr}}_{\textsf{MAC}, \phi _{\textsf{MAC}}}(\mathcal {F}) = \Pr [\textrm{G}^{\textsf{rkcr}}_{\textsf{MAC}, \phi _{\textsf{MAC}}, \mathcal {F}}]\). It is clear by inspection that the \(\textrm{RKCR}\)-security of \(\textsf{MTP}\text {-}\textsf{MAC}.\textsf{Ev}(\textit{mk}_{\textit{u}}, p) = \textsf{SHA}-\textsf{256}(\textit{mk}_\textit{u}~\Vert ~p){[64:192]}\) (with respect to \(\phi _{\textsf{MAC}}\) from Fig. 24) reduces to the collision resistance of truncated output \(\textsf{SHA}-\textsf{256}\).

Fig. 27
Fig. 27

Related-key collision resistance of function family \(\textsf{MAC}\) with respect to related-key-deriving function \(\phi _{\textsf{MAC}}\)

5.1.4 \(\textsf{MTP}\text {-}\textsf{MAC}\) is a PRF under RKA for unique-prefix inputs

We require that \(\textsf{MTP}\text {-}\textsf{MAC}\) behaves like a pseudorandom function in the RKA setting when it is evaluated on a set of inputs that have unique 256-bit prefixes (\(\textrm{UPRKPRF}\)), as defined in Fig. 28. The security game \(\textrm{G}^\textsf{uprkprf}_{\textsf{MAC}, \phi _{\textsf{MAC}}, \mathcal {D}}\) in Fig. 28 extends the standard PRF notion to use two related \(\phi _{\textsf{MAC}}\)-derived function keys \(\textit{mk}_\mathcal {I}, \textit{mk}_\mathcal {R}\) for the function family \(\textsf{MAC}\) (similar to the \(\textrm{RKPRF}\)-security notion we defined above), but it also enforces that the adversary \(\mathcal {D}\) cannot query its oracle \(\textsc {RoR}\) on two inputs \((\textit{u}, p_0)\) and \((\textit{u}, p_1)\) for any \(\textit{u}\in \{\mathcal {I}, \mathcal {R}\}\) such that \(p_0, p_1\) share the same 256-bit prefix. The unique-prefix condition means that the game does not need to maintain a PRF table to achieve output consistency. Note that this security game only allows to call the oracle \(\textsc {RoR}\) with inputs of length \(\left| p\right| \ge 256\); this is sufficient for our purposes, because in \(\textsf{MTP}\text {-}\textsf{CH} \) the function family \(\textsf{MTP}\text {-}\textsf{MAC}\) is only used with payloads that are longer than 256 bits. The advantage of \(\mathcal {D}\) in breaking the \(\textrm{UPRKPRF}\)-security of \(\textsf{MAC}\) with respect to \(\phi _{\textsf{MAC}}\) is defined as \(\textsf{Adv}^{\textsf{uprkprf}}_{\textsf{MAC}, \phi _{\textsf{MAC}}}(\mathcal {D}) = 2 \cdot \Pr [\textrm{G}^\textsf{uprkprf}_{\textsf{MAC}, \phi _{\textsf{MAC}}, \mathcal {D}}] - 1\).

Fig. 28
Fig. 28

Related-key PRF security of function family \(\textsf{MAC}\) for inputs with unique 256-bit prefixes, with respect to key derivation function \(\phi _{\textsf{MAC}}\)

In Sect. 5.2, we define a novel security notion that requires \(\textsf{SHACAL}-\textsf{2}\) to be a leakage-resilient, related-key PRF when evaluated on a fixed input; in Appendix E.3, we show that the \(\textrm{UPRKPRF}\)-security of \(\textsf{MTP}\text {-}\textsf{MAC}\) reduces to this security notion and to the one-time PRF security (\(\textrm{OTPRF}\)) of the \(\textsf{SHA}-\textsf{256}\) compression function \(h _{256}\). The new security notion is similar to the notion discussed in Sect. 5.1 and defined in Sect. 5.2, in that it only allows the adversary to evaluate \(\textsf{SHACAL}-\textsf{2}\) on the fixed input \(\textsf{IV}_{256}\). However, the underlying security game derives the related \(\textsf{SHACAL}-\textsf{2}\) keys differently, partially based on the function \(\phi _{\textsf{MAC}}\) defined in Fig. 24 (as opposed to \(\phi _{\textsf{KDF}}\)). The new notion is formalised as the \(\textrm{HRKPRF}\)-security of \(\textsf{SHACAL}-\textsf{2}\) with respect to \(\phi _{\textsf{MAC}}\).

5.1.5 \(\textsf{MTP}\text {-}\textsf{SE}\) is a one-time indistinguishable SE scheme

For any block cipher \(\textsf{E}\), Appendix E.4 shows that \(\textsf{IGE}[\textsf{E}]\) as used in MTProto is \(\mathrm {OTIND\$}\)-secure (defined in Fig. 4) if \(\textsf{CBC}[\textsf{E}]\) is \(\mathrm {OTIND\$}\)-secure. This enables us to use standard results [13, 49] on \(\textsf{CBC}\) in our analysis of MTProto.

5.2 Novel assumptions about \(\textsf{SHACAL}-\textsf{2}\)

In this section, we define two novel assumptions about \(\textsf{SHACAL}-\textsf{2}\). Both assumptions require \(\textsf{SHACAL}-\textsf{2}\) to be a related-key PRF when evaluated on the fixed input \(\textsf{IV}_{256}\) (i.e. on the initial state of \(\textsf{SHA}-\textsf{256}\)), meaning that the adversary can obtain the values of \(\textsf{SHACAL}-\textsf{2}.\textsf{Ev}(\cdot , \textsf{IV}_{256})\) for a number of different but related keys. We formalise the two assumptions as security notions, called \(\textrm{LRKPRF}\) and \(\textrm{HRKPRF}\), each defined with respect to different related-key-deriving functions; this reflects the fact that these security notions allow the adversary to choose the keys in substantially different ways. The notion of \(\textrm{LRKPRF}\)-security derives the \(\textsf{SHACAL}-\textsf{2}\) keys partially based on the function \(\phi _{\textsf{KDF}}\), whereas the notion of \(\textrm{HRKPRF}\)-security derives \(\textsf{SHACAL}-\textsf{2}\) keys partially based on the function \(\phi _{\textsf{MAC}}\) (both functions are defined in Fig. 24). Both security notions also have different flavours of leakage resilience: (1) the security game defining \(\textrm{LRKPRF}\) allows the adversary to directly choose 128 bits of the 512-bit long \(\textsf{SHACAL}-\textsf{2}\) key, with another 96 bits of this key fixed and known (due to being chosen by the SHA padding function \(\textsf{SHA}-\textsf{pad}\)), and (2) the security game defining \(\textrm{HRKPRF}\) allows the adversary to directly choose 256 bits of the 512-bit long \(\textsf{SHACAL}-\textsf{2}\) key.

We use the notion of \(\textrm{LRKPRF}\)-security to justify the \(\textrm{RKPRF}\)-security of \(\textsf{MTP}\text {-}\textsf{KDF}\) with respect to \(\phi _{\textsf{KDF}}\) (as explained in Sect. 5.1, with the security reduction in Appendix E.2), which is needed in both the \(\textrm{IND}\)-security and the \(\textrm{INT}\)-security proofs of \(\textsf{MTP}\text {-}\textsf{CH} \). We use the notion of \(\textrm{HRKPRF}\)-security to justify the \(\textrm{UPRKPRF}\)-security of \(\textsf{MTP}\text {-}\textsf{MAC}\) with respect to \(\phi _{\textsf{MAC}}\) (as explained in Sect. 5.1, with the security reduction in Appendix E.3), which is needed in the \(\textrm{IND}\)-security proof of \(\textsf{MTP}\text {-}\textsf{CH} \).

We stress that we have to assume properties of \(\textsf{SHACAL}-\textsf{2}\) that have not been studied in the literature. Related-key attacks on reduced-round \(\textsf{SHACAL}-\textsf{2}\) have been considered [37, 41], but they ordinarily work with a known difference relation between unknown keys. In contrast, our \(\textrm{LRKPRF}\)-security notion uses keys that differ by random, unknown parts. Both of our security notions consider keys that are partially chosen or known by the adversary. In Appendix F, we show that both the \(\textrm{LRKPRF}\)-security and the \(\textrm{HRKPRF}\)-security of \(\textsf{SHACAL}-\textsf{2}\) hold in the ideal cipher model (i.e. when \(\textsf{SHACAL}-\textsf{2}\) is modelled as the ideal cipher); we provide concrete upper bounds for breaking each of them. However, we cannot rule out the possibility of attacks on \(\textsf{SHACAL}-\textsf{2}\) due to its internal structure in the setting of related-key attacks combined with key leakage. We leave this as an open question.

5.2.1 \(\textsf{SHACAL}-\textsf{2}\) is a PRF with \(\phi _{\textsf{KDF}}\)-based related keys

Our \(\textrm{LRKPRF}\)-security notion for \(\textsf{SHACAL}-\textsf{2}\) is defined with respect to related-key-deriving functions \(\phi _{\textsf{KDF}}\) (from Fig. 24) and \(\phi _{\textsf{SHACAL}-\textsf{2}}\) from Fig. 29. The latter mirrors the design of \(\textsf{MTP}\text {-}\textsf{KDF}\) that (in Definition 9) is defined to return \(\textsf{SHA}-\textsf{256}(\textsf{msg}{\_}\textsf{key} ~\Vert ~\textit{kk}_0) ~\Vert ~\) \(\textsf{SHA}-\textsf{256}(\textit{kk}_1 ~\Vert ~\textsf{msg}{\_}\textsf{key})\) for the target key \(\textit{kk}_\textit{u}= (\textit{kk}_0, \textit{kk}_1)\), except \(\phi _{\textsf{SHACAL}-\textsf{2}}\) only needs to produce the corresponding SHA-padded inputs. We note that \(\textrm{LRKPRF}\)-security of \(\textsf{SHACAL}-\textsf{2}\) could instead be defined with respect to a single related-key-deriving function that would merge \(\phi _{\textsf{KDF}}\) and \(\phi _{\textsf{SHACAL}-\textsf{2}}\), which could lead to a cleaner formalisation of \(\textrm{LRKPRF}\)-security; however, we chose to avoid introducing an additional abstraction level here.

Fig. 29
Fig. 29

Related-key-deriving function \(\phi _{\textsf{SHACAL}-\textsf{2}}:(\textsf{MTP}\text {-}\textsf{KDF}.\textsf{KS} \times \textsf{MTP}\text {-}\textsf{KDF}.\textsf{KS}) \times \{0,1\}^{128} \rightarrow \{0,1\}^{512}\)

Consider the game \(\textrm{G}^\textsf{lrkprf}_{\textsf{SHACAL}-\textsf{2}, \phi _{\textsf{KDF}}, \phi _{\textsf{SHACAL}-\textsf{2}}, \mathcal {D}}\) in Fig. 30. Adversary \(\mathcal {D}\) is given access to the \(\textsc {RoR}\) oracle that takes \(\textit{u}, i, \textsf{msg}{\_}\textsf{key}\) as input; all inputs to the oracle serve as parameters for the \(\textsf{SHACAL}-\textsf{2}\) key derivation, used to determine the challenge key \(\textit{sk} _i\). The adversary gets back either the output of \(\textsf{SHACAL}-\textsf{2}.\textsf{Ev}(\textit{sk} _i\), \(\textsf{IV}_{256})\) (if \(b=1\)), or a uniformly random value (if \(b=0\)), and is required to guess the challenge bit. The PRF table \(\textsf{T}\) is used to ensure consistency, so that a single random value is sampled and remembered for each set of used key derivation parameters \(\textit{u}, i, \textsf{msg}{\_}\textsf{key}\). The advantage of \(\mathcal {D}\) in breaking the \(\textrm{LRKPRF}\)-security of \(\textsf{SHACAL}-\textsf{2}\) with respect to \(\phi _{\textsf{KDF}}\) and \(\phi _{\textsf{SHACAL}-\textsf{2}}\) is defined as \(\textsf{Adv}^{\textsf{lrkprf}}_{\textsf{SHACAL}-\textsf{2}, \phi _{\textsf{KDF}}, \phi _{\textsf{SHACAL}-\textsf{2}}}(\mathcal {D}) = 2 \cdot \Pr [\textrm{G}^\textsf{lrkprf}_{\textsf{SHACAL}-\textsf{2}, \phi _{\textsf{KDF}}, \phi _{\textsf{SHACAL}-\textsf{2}}, \mathcal {D}}] - 1\).

Fig. 30
Fig. 30

Leakage-resilient, related-key PRF security of function family \(\textsf{SHACAL}-\textsf{2}\) on fixed input \(\textsf{IV}_{256}\) with respect to related-key-deriving functions \(\phi _{\textsf{KDF}}\) and \(\phi _{\textsf{SHACAL}-\textsf{2}}\)

5.2.2 \(\textsf{SHACAL}-\textsf{2}\) is a PRF with \(\phi _{\textsf{MAC}}\)-based related keys

Consider the game \(\textrm{G}^\textsf{hrkprf}_{\textsf{SHACAL}-\textsf{2}, \phi _{\textsf{MAC}}, \mathcal {D}}\) in Fig. 31. Adversary \(\mathcal {D}\) is given access to \(\textsc {RoR}\) oracle and is required to choose the 256-bit suffix \(p\) of each challenge key used for evaluating \(\textsf{SHACAL}-\textsf{2}.\textsf{Ev}(\cdot , \textsf{IV}_{256})\). The value of \(\textit{mk}_\textit{u}\) is then used to set the 256-bit prefix of the challenge key, where \(\textit{u}\) is also chosen by the adversary, but the \(\textit{mk}_\mathcal {I}, \textit{mk}_\mathcal {R}\) values themselves are related secrets that are not known to \(\mathcal {D}\). The advantage of \(\mathcal {D}\) in breaking the \(\textrm{HRKPRF}\)-security of \(\textsf{SHACAL}-\textsf{2}\) with respect to \(\phi _{\textsf{MAC}}\) is defined as \(\textsf{Adv}^{\textsf{hrkprf}}_{\textsf{SHACAL}-\textsf{2}, \phi _{\textsf{MAC}}}(\mathcal {D}) = 2 \cdot \Pr [\textrm{G}^\textsf{hrkprf}_{\textsf{SHACAL}-\textsf{2}, \phi _{\textsf{MAC}}, \mathcal {D}}] - 1\).

Fig. 31
Fig. 31

Leakage-resilient, related-key PRF security of function family \(\textsf{SHACAL}-\textsf{2}\) on fixed input \(\textsf{IV}_{256}\) with respect to related-key-deriving function \(\phi _{\textsf{MAC}}\)

5.3 Security requirements on message encoding

In Sect. 3.5, we defined encoding integrity of a message encoding scheme \(\textsf{ME}\) with respect to any support function \(\textsf{supp}\). We now define the support function \(\textsf{supp}= \textsf{supp}\text {-}\textsf{ord}\) that will be used for our security proofs. We also define three ad hoc notions that must be met by the MTProto-based message encoding scheme \(\textsf{MTP}\text {-}\textsf{ME}\) in order to be compatible with our security proofs.

5.3.1 \(\textsf{MTP}\text {-}\textsf{ME}\) ensures in-order delivery

We require that \(\textsf{MTP}\text {-}\textsf{ME}\) is \(\textrm{EINT}\)-secure (Fig. 16) with respect to the support function \(\textsf{supp}\text {-}\textsf{ord}\) defined in Fig. 32. We define \(\textsf{supp}\text {-}\textsf{ord}\) to enforce in-order delivery for each user’s sent messages (i.e. independently in each direction), thus preventing message forgeries, replays, (unidirectional) reordering and drops.

Fig. 32
Fig. 32

Support function \(\textsf{supp}\text {-}\textsf{ord}\) for in-order message delivery

The formalisation of the support function \(\textsf{supp}\text {-}\textsf{ord}\) uses a helper function \(\textsf{find}(\textsf{op}, \textsf{tr}_{}, \textsf{label})\) that searches a transcript \(\textsf{tr}_{}\) for an \(\textsf{op}\)-type entry (where \(\textsf{op}\in \{\textsf{sent}, \textsf{recv}\}\)) containing a target label \(\textsf{label}\). This code relies on an assumption that all support labels are unique, which is true for payloads of \(\textsf{MTP}\text {-}\textsf{ME}\) and for ciphertexts of \(\textsf{MTP}\text {-}\textsf{CH} \) as long as at most \(2^{96}\) plaintexts are sent.Footnote 26 The function \(\textsf{find}\) also determines \(N_{\textsf{op}} \), the order number of the target entry among all valid entries (i.e. the number of entries in the transcript up to and including the target entry); if the entry was not found, then \(N_{\textsf{op}} \) is set to the number of all valid entries in the transcript. The support function \(\textsf{supp}\text {-}\textsf{ord}\) on inputs \(\textit{u}, \textsf{tr}_{\textit{u}}, \textsf{tr}_{\overline{\textit{u}}}, \textsf{label}\) requires that (i) there is no entry with label \(\textsf{label}\) and a non-\(\bot \) message in the receiver’s transcript \(\textsf{tr}_{\textit{u}}\), (ii) an entry with label \(\textsf{label}\) is found in the sender’s transcript \(\textsf{tr}_{\overline{\textit{u}}}\), and (iii) the number of valid entries in the receiver’s transcript is one fewer than the order number of the entry found in the sender’s transcript, i.e. \(N_{\textsf{sent}} = N_{\textsf{recv}} + 1\). Here the condition (i) prevents message replays, the condition (ii) prevents message forgery, whereas the condition (iii) prevents message reordering and drops. As outlined in Sect. 4.2, the message encoding scheme \(\textsf{ME}\) in MTProto we studied (cf. Appendix D) allowed reordering so it was not \(\textrm{EINT}\)-secure with respect to \(\textsf{supp}\text {-}\textsf{ord}\); instead we use the simplified message encoding scheme \(\textsf{MTP}\text {-}\textsf{ME}\) (cf. Definition 6) for our formal analysis of MTProto.Footnote 27 In Appendix E.5, we show that \(\textsf{Adv}^{\textsf{eint}}_{\textsf{MTP}\text {-}\textsf{ME}, \textsf{supp}\text {-}\textsf{ord}}(\mathcal {F}) = 0\) for any \(\mathcal {F}\) making at most \(2^{96}\) queries to \(\textsc {Send}\).

Fig. 33
Fig. 33

Prefix uniqueness of message encoding scheme \(\textsf{ME}\)

5.3.2 Prefix uniqueness of \(\textsf{MTP}\text {-}\textsf{ME}\)

We require that payloads produced by \(\textsf{MTP}\text {-}\textsf{ME}\) have distinct prefixes of size 256 bits (independently for each user \(\textit{u}\in \{\mathcal {I}, \mathcal {R}\}\)), as defined by the security game in Fig. 33. The advantage of an adversary \(\mathcal {F}\) in breaking the \(\textrm{UPREF}\)-security of a message encoding scheme \(\textsf{ME}\) is defined as \(\textsf{Adv}^{\textsf{upref}}_{\textsf{ME}}(\mathcal {F}) = \Pr [\textrm{G}^{\textsf{upref}}_{\textsf{ME}, \mathcal {F}}]\). Given the fixed prefix size, this notion cannot be satisfied against unbounded adversaries. Our \(\textsf{MTP}\text {-}\textsf{ME}\) scheme ensures unique prefixes using the \(96\)-bit counter \(\textsf {seq}\_\textsf {no} \) that contains the number of messages sent by user \(\textit{u}\), so we have \(\textsf{Adv}^{\textsf{upref}}_{\textsf{MTP}\text {-}\textsf{ME}}(\mathcal {F}) = 0\) for any \(\mathcal {F}\) making at most \(2^{96}\) queries, and otherwise there exists an adversary \(\mathcal {F}\) such that \(\textsf{Adv}^{\textsf{upref}}_{\textsf{MTP}\text {-}\textsf{ME}}(\mathcal {F}) = 1\). Note that \(\textsf{MTP}\text {-}\textsf{ME}\) always has payloads larger than 256 bits. The MTProto implementation of message encoding we analysed was not \(\textrm{UPREF}\)-secure as it allowed repeated msg_id (cf. Sect. 4.2).

Fig. 34
Fig. 34

Encoding robustness of message encoding scheme \(\textsf{ME}\)

5.3.3 Encoding robustness of \(\textsf{MTP}\text {-}\textsf{ME}\)

We require that decoding in \(\textsf{MTP}\text {-}\textsf{ME}\) should not affect its state in such a way that would be visible in future encoded payloads, as defined by the security game in Fig. 34. The advantage of an adversary \(\mathcal {D}\) in breaking the \(\textrm{ENCROB}\)-security of a message encoding scheme \(\textsf{ME}\) is defined as \(\textsf{Adv}^{\textsf{encrob}}_{\textsf{ME}}(\mathcal {D}) = 2\cdot \Pr [\textrm{G}^{\textsf{encrob}}_{\textsf{ME}, \mathcal {D}}]-1\). This advantage is trivially zero for both \(\textsf{MTP}\text {-}\textsf{ME}\) and the original MTProto message encoding scheme (cf. Appendix D). Note, however, that this property prevents a message encoding scheme from building payloads that include the number of previously received messages. It is thus incompatible with stronger notions of resistance against reordering attacks such as the global transcript (cf. Sect. 4.2).

Fig. 35
Fig. 35

Unpredictability of deterministic symmetric encryption scheme \(\textsf{SE}\) with respect to message encoding scheme \(\textsf{ME}\)

5.3.4 Combined security of \(\textsf{MTP}\text {-}\textsf{SE}\) and \(\textsf{MTP}\text {-}\textsf{ME}\)

We require that decryption in \(\textsf{MTP}\text {-}\textsf{SE}\) with uniformly random keys has unpredictable outputs with respect to \(\textsf{MTP}\text {-}\textsf{ME}\), as defined in Fig. 35. The security game \(\textrm{G}^{\textsf{unpred}}_{\textsf{SE}, \textsf{ME}, \mathcal {F}}\) in Fig. 35 gives adversary \(\mathcal {F}\) access to two oracles. For any user \(\textit{u}\in \{\mathcal {I},\mathcal {R}\}\) and message key \(\textsf{msg}{\_}\textsf{key}\), oracle \(\textsc {Ch}\) decrypts a given ciphertext \(c_{\textit{se}}\) of deterministic symmetric encryption scheme \(\textsf{SE}\) under a uniformly random key \(k\in \{0,1\}^{\mathsf {\textsf{SE}.kl}}\) and then decodes it using the given message encoding state \(\textit{st}_\textsf{ME}\) of message encoding scheme \(\textsf{ME}\), returning no output. The adversary is allowed to choose arbitrary values of \(c_{\textit{se}}\) and \(\textit{st}_\textsf{ME}\); it is allowed to repeatedly query oracle \(\textsc {Ch}\) on inputs that contain the same values for \(\textit{u}, \textsf{msg}{\_}\textsf{key}\) in order to reuse a fixed, secret \(\textsf{SE}\) key k with different choices of \(c_{\textit{se}}\). Oracle \(\textsc {Expose}\) lets \(\mathcal {F}\) learn the \(\textsf{SE}\) key corresponding to the given \(\textit{u}\) and \(\textsf{msg}{\_}\textsf{key}\); the table \(\textsf{S}\) is then used to disallow the adversary from querying \(\textsc {Ch}\) with this pair of \(\textit{u}\) and \(\textsf{msg}{\_}\textsf{key}\) values again. \(\mathcal {F}\) wins if it can cause \(\mathsf {\textsf{ME}.Decode}\) to output a valid \(m \not = \bot \). Note that \(\textsf{msg}{\_}\textsf{key}\) in this game merely serves as a label for the tables, so we allow it to be an arbitrary string \(\textsf{msg}{\_}\textsf{key}\in \{0,1\}^*\). The advantage of \(\mathcal {F}\) in breaking the \(\textrm{UNPRED}\)-security of \(\textsf{SE}\) with respect to \(\textsf{ME}\) is defined as \(\textsf{Adv}^{\textsf{unpred}}_{\textsf{SE}, \textsf{ME}}(\mathcal {F}) = \Pr [\textrm{G}^{\textsf{unpred}}_{\textsf{SE}, \textsf{ME}, \mathcal {F}}]\). In Appendix E.6, we show that \(\textsf{Adv}^{\textsf{unpred}}_{\textsf{MTP}\text {-}\textsf{SE}, \textsf{MTP}\text {-}\textsf{ME}}(\mathcal {F}) \le {q_{\textsc {Ch}}}/{2^{64}}\) for any \(\mathcal {F}\) making \(q_{\textsc {Ch}}\) queries.

5.4 Correctness of \(\textsf{MTP}\text {-}\textsf{CH}\)

We claim that our MTProto-based channel satisfies our correctness definition. Consider any adversary \(\mathcal {F}\) playing in the correctness game \(\textrm{G}^{\textsf{corr}}_{\textsf{CH}, \textsf{supp}, \mathcal {F}}\) (Fig. 13) for channel \(\textsf{CH}= \textsf{MTP}\text {-}\textsf{CH} \) (Fig. 19) and support function \(\textsf{supp}= \textsf{supp}\text {-}\textsf{ord}\) (Fig. 32). Due to the definition of \(\textsf{supp}\text {-}\textsf{ord}\), the \(\textsc {Recv}\) oracle in game \(\textrm{G}^{\textsf{corr}}_{\textsf{MTP}\text {-}\textsf{CH}, \textsf{supp}\text {-}\textsf{ord}, \mathcal {F}}\) rejects all \(\textsf{CH}\) ciphertexts that were not previously returned by the \(\textsc {Send}\) oracle. The encryption and decryption algorithms of channel \(\textsf{MTP}\text {-}\textsf{CH} \) rely in a modular way on the message encoding scheme \(\textsf{MTP}\text {-}\textsf{ME}\), deterministic function families \(\textsf{MTP}\text {-}\textsf{KDF}, \textsf{MTP}\text {-}\textsf{MAC}\), and deterministic symmetric encryption scheme \(\textsf{MTP}\text {-}\textsf{SE}\); the latter provides decryption correctness, so any valid ciphertext processed by oracle \(\textsc {Recv}\) correctly yields the originally encrypted payload \(p\). Thus, we need to show that \(\textsf{MTP}\text {-}\textsf{ME}\) always recovers the expected plaintext m from payload \(p\), meaning m matches the corresponding output of \(\textsf{supp}\text {-}\textsf{ord}\). In Sect. 3.5, we formalised this requirement as the encoding correctness of \(\textsf{MTP}\text {-}\textsf{ME}\) with respect to \(\textsf{supp}\text {-}\textsf{ord}\) and discussed that it is also implied by the encoding integrity of \(\textsf{MTP}\text {-}\textsf{ME}\) with respect to \(\textsf{supp}\text {-}\textsf{ord}\). We prove the latter in Appendix E.5 for adversaries that make at most \(2^{96}\) queries.

5.5 \(\textrm{IND}\)-security of \(\textsf{MTP}\text {-}\textsf{CH}\)

We begin our \(\textrm{IND}\)-security reduction by considering an arbitrary adversary \(\mathcal {D}_{\textrm{IND}}\) playing in the \(\textrm{IND}\)-security game against channel \(\textsf{CH}= \textsf{MTP}\text {-}\textsf{CH} \) (i.e. \(\textrm{G}^{\textsf{ind}}_{\textsf{CH}, \mathcal {D}_{\textrm{IND}}}\) in Fig. 14), and we gradually change this game until we can show that \(\mathcal {D}_{\textrm{IND}}\) can no longer win. To this end, we make three key observations:

  1. (1)

    Recall that oracle \(\textsc {Recv}\) always returns \(\bot \), and the only functionality of this oracle is to update the state of receiver’s channel by calling \(\mathsf {\textsf{CH}.Recv}\). We assume that calls to \(\mathsf {\textsf{CH}.Recv}\) never affect the ciphertexts that are returned by future calls to \(\mathsf {\textsf{CH}.Send}\) (more precisely, we use the \(\textrm{ENCROB}\) property of \(\textsf{ME}\) that reasons about payloads rather than ciphertexts). This allows us to completely disregard the \(\textsc {Recv}\) oracle, making it immediately return \(\bot \) without calling \(\mathsf {\textsf{CH}.Recv}\).

  2. (2)

    We use the \(\textrm{UPRKPRF}\)-security of \(\textsf{MAC}\) to show that the ciphertexts returned by oracle \(\textsc {Ch}\) contain \(\textsf{msg}{\_}\textsf{key}\) values that look uniformly random and are independent of each other. Roughly, this security notion requires that \(\textsf{MAC}\) can only be evaluated on a set of inputs with unique prefixes. To ensure this, we assume that the payloads produced by \(\textsf{ME}\) meet this requirement (as formalised by the \(\textrm{UPREF}\) property of \(\textsf{ME}\)).

  3. (3)

    In order to prove that oracle \(\textsc {Ch}\) does not leak the challenge bit, it remains to show that ciphertexts returned by \(\textsc {Ch}\) contain \(c_{\textit{se}}\) values that look uniformly random and independent of each other. This follows from the \(\mathrm {OTIND\$}\)-security of \(\textsf{SE}\). We invoke the \(\textrm{OTWIND}\)-security of \(\textsf{HASH}\) to show that \(\textsf{auth}{\_}\textsf{key}{\_}\textsf{id}\) does not leak any information about the \(\textsf{KDF}\) keys; we then use the \(\textrm{RKPRF}\)-security of \(\textsf{KDF}\) to show that the keys used for \(\textsf{SE}\) are uniformly random. Finally, we use the birthday bound to argue that the uniformly random values of \(\textsf{msg}{\_}\textsf{key}\) are unlikely to collide, and hence, the keys used for \(\textsf{SE}\) are also one-time.

Formally, we prove the following.

Theorem 1

Let \(\textsf{ME}\), \(\textsf{HASH}\), \(\textsf{MAC}\), \(\textsf{KDF}\), \(\phi _{\textsf{MAC}}\), \(\phi _{\textsf{KDF}}\), \(\textsf{SE}\) be any primitives that meet the requirements stated in Definition 5 of channel \(\textsf{MTP}\text {-}\textsf{CH} \). Let \(\textsf{CH}= \textsf{MTP}\text {-}\textsf{CH} [\textsf{ME}, \textsf{HASH}, \textsf{MAC}, \textsf{KDF}, \phi _{\textsf{MAC}}, \phi _{\textsf{KDF}}, \textsf{SE}]\). Let \(\mathcal {D}_{\textrm{IND}}\) be any adversary against the \(\textrm{IND}\)-security of \(\textsf{CH}\), making \(q_{\textsc {Ch}}\) queries to its \(\textsc {Ch}\) oracle. Then, we can build adversaries \(\mathcal {D}_{\textrm{OTWIND}}\), \(\mathcal {D}_{\textrm{RKPRF}}\), \(\mathcal {D}_{\textrm{ENCROB}}\), \(\mathcal {F}_{\textrm{UPREF}}\), \(\mathcal {D}_{\textrm{UPRKPRF}}\), \(\mathcal {D}_{\mathrm {OTIND\$}}\) such that

$$\begin{aligned} \textsf{Adv}^{\textsf{ind}}_{\textsf{CH}}(\mathcal {D}_{\textrm{IND}}) \le 2&\cdot \Big (\textsf{Adv}^{\textsf{otwind}}_{\textsf{HASH}}(\mathcal {D}_{\textrm{OTWIND}}) + \textsf{Adv}^{\textsf{rkprf}}_{\textsf{KDF}, \phi _{\textsf{KDF}}}(\mathcal {D}_{\textrm{RKPRF}}) \\&+ \textsf{Adv}^{\textsf{encrob}}_{\textsf{ME}}(\mathcal {D}_{\textrm{ENCROB}}) + \textsf{Adv}^{\textsf{upref}}_{\textsf{ME}}(\mathcal {F}_{\textrm{UPREF}}) \\&+ \textsf{Adv}^{\textsf{uprkprf}}_{\textsf{MAC}, \phi _{\textsf{MAC}}}(\mathcal {D}_{\textrm{UPRKPRF}}) + \frac{q_{\textsc {Ch}}\cdot (q_{\textsc {Ch}}- 1)}{2 \cdot 2^{\textsf{MAC}.\textsf{ol}}} \\&+ \textsf{Adv}^{\mathsf {otind\$}}_{\textsf{SE}}(\mathcal {D}_{\mathrm {OTIND\$}})\Big ). \\ \end{aligned}$$

Proof

This proof uses games \(\textrm{G}_0\)\(\textrm{G}_3\) in Fig. 39 and \(\textrm{G}_4\)\(\textrm{G}_8\) in Fig. 40, in which the code added for the transitions between games is highlighted in . The adversaries for transitions between games are referenced throughout the proof. Each constructed adversary simulates one or two subsequent games of the security reduction for adversary \(\mathcal {D}_{\textrm{IND}}\). The instructions mark the changes in the code of the simulated games.

\({\textbf{G}}_{0}\). Game \(\textrm{G}_0\) is equivalent to game \(\textrm{G}^{\textsf{ind}}_{\textsf{CH}, \mathcal {D}_{\textrm{IND}}}\). It expands the code of algorithms \(\mathsf {\textsf{CH}.Init}\), \(\mathsf {\textsf{CH}.Send}\) and \(\mathsf {\textsf{CH}.Recv}\); the expanded instructions are highlighted in . It follows that

$$ \textsf{Adv}^{\textsf{ind}}_{\textsf{CH}}(\mathcal {D}_{\textrm{IND}}) = 2 \cdot \Pr [\textrm{G}_0] - 1. $$

\({{{\textbf{G}}}_{0}\rightarrow {{\textbf{G}}}_{1}}\). Note that the value of \(\textsf{auth}{\_}\textsf{key}{\_}\textsf{id}\) depends on the raw \(\textsf{KDF}\) and \(\textsf{MAC}\) keys (i.e. \(\textit{kk}\) and \(\textit{mk}\)), and adversary \(\mathcal {D}_{\textrm{IND}}\) can learn it from any ciphertext returned by oracle \(\textsc {Ch}\). To invoke PRF-style security notions for either primitive in later steps, we appeal to the \(\textrm{OTWIND}\)-security of \(\textsf{HASH}\) (Fig. 25), which essentially guarantees that \(\textsf{auth}{\_}\textsf{key}{\_}\textsf{id}\) leaks no information about \(\textsf{KDF}\) and \(\textsf{MAC}\) keys. Game \(\textrm{G}_1\) is the same as game \(\textrm{G}_0\), except \(\textsf{auth}{\_}\textsf{key}{\_}\textsf{id}\leftarrow \textsf{HASH}.\textsf{Ev}(\textit{hk}, \cdot )\) is evaluated on a uniformly random string x rather than on \(\textit{kk}~\Vert ~\textit{mk}\). We claim that \(\mathcal {D}_{\textrm{IND}}\) cannot distinguish between these two games.

Fig. 36
Fig. 36

Adversary \(\mathcal {D}_{\textrm{OTWIND}}\) against the \(\textrm{OTWIND}\)-security of \(\textsf{HASH}\) for the transition between games \(\textrm{G}_0\)\(\textrm{G}_1\)

More formally, given \(\mathcal {D}_{\textrm{IND}}\), in Fig. 36 we define an adversary \(\mathcal {D}_{\textrm{OTWIND}}\) attacking the \(\textrm{OTWIND}\)-security of \(\textsf{HASH}\) as follows. According to the definition of game \(\textrm{G}^\textsf{otwind}_{\textsf{HASH}, \mathcal {D}_{\textrm{OTWIND}}}\), adversary \(\mathcal {D}_{\textrm{OTWIND}}\) takes \((x_0, x_1, \textsf{auth}{\_}\textsf{key}{\_}\textsf{id})\) as input. We define adversary \(\mathcal {D}_{\textrm{OTWIND}}\) to sample a challenge bit b, to parse \(\textit{kk}~\Vert ~\textit{mk}\leftarrow x_1\), and to subsequently use the obtained values of \(b, \textit{kk}, \textit{mk}, \textsf{auth}{\_}\textsf{key}{\_}\textsf{id}\) in order to simulate either of the games \(\textrm{G}_0\), \(\textrm{G}_1\) for adversary \(\mathcal {D}_{\textrm{IND}}\) (both games are equivalent from the moment these 4 values are chosen). If \(\mathcal {D}_{\textrm{IND}}\) guesses the challenge bit b, then we let adversary \(\mathcal {D}_{\textrm{OTWIND}}\) return 1; otherwise we let it return 0. Now let d be the challenge bit in game \(\textrm{G}^\textsf{otwind}_{\textsf{HASH}, \mathcal {D}_{\textrm{OTWIND}}}\), and let \(d'\) be the value returned by \(\mathcal {D}_{\textrm{OTWIND}}\). If \(d = 1\) then \(\mathcal {D}_{\textrm{OTWIND}}\) simulates game \(\textrm{G}_0\) for \(\mathcal {D}_{\textrm{IND}}\) (i.e. \(\textit{kk}\) and \(\textit{mk}\) are derived from the input to \(\textsf{HASH}.\textsf{Ev}(\textit{hk}, \cdot )\)), and otherwise it simulates game \(\textrm{G}_1\) (i.e. \(\textit{kk}\) and \(\textit{mk}\) are independent from the input to \(\textsf{HASH}.\textsf{Ev}(\textit{hk}, \cdot )\)). It follows that \(\Pr [\textrm{G}_0] = \Pr \left[ \,d' = 1\,|\,d = 1\, \right] \) and \(\Pr [\textrm{G}_1] = \Pr \left[ \,d' = 1\,|\,d = 0\, \right] \), and hence

$$ \Pr [\textrm{G}_0] - \Pr [\textrm{G}_1] = \textsf{Adv}^{\textsf{otwind}}_{\textsf{HASH}}(\mathcal {D}_{\textrm{OTWIND}}). $$

\({{{\textbf{G}}}_{1}\rightarrow {{\textbf{G}}}_{2}}\). In the transition between games \(\textrm{G}_1\) and \(\textrm{G}_2\) (Fig. 39), we use the \(\textrm{RKPRF}\)-security of \(\textsf{KDF}\) (Fig. 26) with respect to \(\phi _{\textsf{KDF}}\) in order to replace \(\textsf{KDF}.\textsf{Ev}(\textit{kk}_{\textit{u}}\), \(\textsf{msg}{\_}\textsf{key})\) with a uniformly random value from \(\{0,1\}^{\textsf{KDF}.\textsf{ol}}\) (and for consistency store the latter in \(\textsf{T}[\textit{u}, \textsf{msg}{\_}\textsf{key}]\)). Similarly to the above, in Fig. 37 we build an adversary \(\mathcal {D}_{\textrm{RKPRF}}\) attacking the \(\textrm{RKPRF}\)-security of \(\textsf{KDF}\) that simulates \(\textrm{G}_1\) or \(\textrm{G}_2\) for adversary \(\mathcal {D}_{\textrm{IND}}\), depending on the challenge bit in game \(\textrm{G}^\textsf{rkprf}_{\textsf{KDF}, \phi _{\textsf{KDF}}, \mathcal {D}_{\textrm{RKPRF}}}\). We have

$$ \Pr [\textrm{G}_1] - \Pr [\textrm{G}_2] = \textsf{Adv}^{\textsf{rkprf}}_{\textsf{KDF}, \phi _{\textsf{KDF}}}(\mathcal {D}_{\textrm{RKPRF}}). $$
Fig. 37
Fig. 37

Adversary \(\mathcal {D}_{\textrm{RKPRF}}\) against the \(\textrm{RKPRF}\)-security of \(\textsf{KDF}\) for the transition between games \(\textrm{G}_1\)\(\textrm{G}_2\)

Fig. 38
Fig. 38

Adversary \(\mathcal {D}_{\textrm{ENCROB}}\) against the \(\textrm{ENCROB}\)-security of \(\textsf{ME}\) for the transition between games \(\textrm{G}_2\)\(\textrm{G}_3\)

\({{{\textbf{G}}}_{2}\rightarrow {{\textbf{G}}}_{3}}\). We invoke the \(\textrm{ENCROB}\) property of \(\textsf{ME}\) (Fig. 34) to transition from \(\textrm{G}_2\) to \(\textrm{G}_3\) (Fig. 39). This property states that calls to \(\mathsf {\textsf{ME}.Decode}\) do not change \(\textsf{ME}\)’s state in a way that affects the payloads returned by any future calls to \(\mathsf {\textsf{ME}.Encode}\), allowing us to remove the \(\mathsf {\textsf{ME}.Decode}\) call from inside the oracle \(\textsc {Recv}\) in game \(\textrm{G}_3\). In Fig. 38 we build an adversary \(\mathcal {D}_{\textrm{ENCROB}}\) against \(\textrm{ENCROB}\) of \(\textsf{ME}\) that simulates either \(\textrm{G}_2\) or \(\textrm{G}_3\) for \(\mathcal {D}_{\textrm{IND}}\), depending on the challenge bit in game \(\textrm{G}^{\textsf{encrob}}_{\textsf{ME}, \mathcal {D}_{\textrm{ENCROB}}}\), such that

$$ \Pr [\textrm{G}_2] - \Pr [\textrm{G}_3] = \textsf{Adv}^{\textsf{encrob}}_{\textsf{ME}}(\mathcal {D}_{\textrm{ENCROB}}). $$
Fig. 39
Fig. 39

Games \(\textrm{G}_0\)\(\textrm{G}_3\) for the proof of Theorem 1. The code added by expanding the algorithms of \(\textsf{CH}\) in game \(\textrm{G}^{{\textrm{ind}}}_{\textsf{CH}, \mathcal {D}_{{\textrm{IND}}}}\) is highlighted in

Fig. 40
Fig. 40

Games \(\textrm{G}_4\)\(\textrm{G}_8\) for the proof of Theorem 1. The code highlighted in was rewritten in a way that is functionally equivalent to the corresponding code in \(\textrm{G}_3\)

\({{{\textbf{G}}}_{3}\rightarrow {{\textbf{G}}}_{4}}\). Game \(\textrm{G}_4\) (Fig. 40) differs from \(\textrm{G}_3\) (Fig. 39) in the following ways:

  1. (1)

    The \(\textsf{KDF}\) keys \(\textit{kk}\), \(\textit{kk}_\mathcal {I}\), \(\textit{kk}_\mathcal {R}\) are no longer used in our reduction games starting from \(\textrm{G}_3\), so they are not included in game \(\textrm{G}_4\) and onwards.

  2. (2)

    The calls to oracle \(\textsc {Recv}\) in game \(\textrm{G}_3\) no longer change the receiver’s channel state, so game \(\textrm{G}_4\) immediately returns \(\bot \) on every call to \(\textsc {Recv}\).

  3. (3)

    Game \(\textrm{G}_4\) rewrites, in a functionally equivalent way, the initialisation and usage of values from the PRF table \(\textsf{T}\) inside oracle \(\textsc {Ch}\).

  4. (4)

    Game \(\textrm{G}_4\) adds a set \(X_{\textit{u}}\), for each \(\textit{u}\in \{\mathcal {I}, \mathcal {R}\}\), that stores 256-bit prefixes of payloads that were produced by calling the specific user’s \(\textsc {Ch}\) oracle. Every time a new payload \(p\) is generated, the added code inside oracle \(\textsc {Ch}\) checks whether its prefix \(p[0:256]\) is already contained inside \(X_{\textit{u}}\), which would mean that another previously seen payload had the same prefix. Then, regardless of whether this condition passes, the new prefix \(p[0:256]\) is added to \(X_{\textit{u}}\). We note that the output of oracle \(\textsc {Ch}\) in game \(\textrm{G}_4\) does not change depending on whether this condition passes or fails.

  5. (5)

    Game \(\textrm{G}_4\) adds Boolean flags \(\textsf{bad}_0\) and \(\textsf{bad}_1\) that are set to \(\texttt {true}\) when the corresponding conditions inside oracle \(\textsc {Ch}\) are satisfied. These flags do not affect the functionality of the games, and will only be used for the formal analysis that we provide below.

Both games are functionally equivalent, so

$$ \Pr [\textrm{G}_4] = \Pr [\textrm{G}_3]. $$

\({{{\textbf{G}}}_{4}\rightarrow {{\textbf{G}}}_{5}}\). The transition from game \(\textrm{G}_4\) to \(\textrm{G}_5\) replaces the value assigned to \(\textsf{msg}{\_}\textsf{key}\) when the newly added unique-prefixes condition is satisfied; the value of \(\textsf{msg}{\_}\textsf{key}\) changes from \(\textsf{MAC}.\textsf{Ev}(\textit{mk}_{\textit{u}}, p)\) to a uniformly random string from \(\{0,1\}^{\textsf{MAC}.\textsf{ol}}\). Games \(\textrm{G}_4\) and \(\textrm{G}_5\) are identical until \(\textsf{bad}_0\) is set. We have

$$ \Pr [\textrm{G}_4] - \Pr [\textrm{G}_5] \le \Pr [\textsf{bad}_0^{\textrm{G}_4}]. $$

The \(\textrm{UPREF}\) property of \(\textsf{ME}\) (Fig. 33) states that it is hard to find two payloads returned by \(\mathsf {\textsf{ME}.Encode}\) such that their 256-bit prefixes are the same; we use this property to upper-bound the probability of setting \(\textsf{bad}_0\) in game \(\textrm{G}_4\). In Fig. 41, we build an adversary \(\mathcal {F}_{\textrm{UPREF}}\) attacking the \(\textrm{UPREF}\) of \(\textsf{ME}\) that simulates game \(\textrm{G}_4\) for adversary \(\mathcal {D}_{\textrm{IND}}\). Every time \(\textsf{bad}_0\) is set in game \(\textrm{G}_4\), this corresponds to adversary \(\mathcal {F}_{\textrm{UPREF}}\) setting flag \(\textsf{win}\) to \(\texttt {true}\) in its own game \(\textrm{G}^{\textsf{upref}}_{\textsf{ME}, \mathcal {F}_{\textrm{UPREF}}}\). It follows that

$$\begin{aligned} \Pr [\textsf{bad}_0^{\textrm{G}_{4}}] \le \textsf{Adv}^{\textsf{upref}}_{\textsf{ME}}(\mathcal {F}_{\textrm{UPREF}}). \end{aligned}$$
Fig. 41
Fig. 41

Adversary \(\mathcal {F}_{\textrm{UPREF}}\) against the \(\textrm{UPREF}\)-security of \(\textsf{ME}\) for the transition between games \(\textrm{G}_4\)\(\textrm{G}_5\)

\({{{\textbf{G}}}_{5}\rightarrow {{\textbf{G}}}_{6}}\). We use the \(\textrm{UPRKPRF}\)-security of \(\textsf{MAC}\) (Fig. 28) with respect to \(\phi _{\textsf{MAC}}\) in order to replace the value of \(\textsf{msg}{\_}\textsf{key}\) from \(\textsf{MAC}.\textsf{Ev}(\textit{mk}_{\textit{u}}, p)\) to a uniformly random value from \(\{0,1\}^{\textsf{MAC}.\textsf{ol}}\) in the transition from \(\textrm{G}_5\) to \(\textrm{G}_6\) (Fig. 40). Note that the notion of \(\textrm{UPRKPRF}\)-security only guarantees the indistinguishability from random when \(\textsf{MAC}\) is evaluated on inputs with unique prefixes, whereas games \(\textrm{G}_5, \textrm{G}_6\) ensure that this prerequisite is satisfied by only evaluating \(\textsf{MAC}\) if \(p[0:256] \not \in X_{\textit{u}}\). In Fig. 42, we build an adversary \(\mathcal {D}_{\textrm{UPRKPRF}}\) attacking the \(\textrm{UPRKPRF}\)-security of \(\textsf{MAC}\) that simulates \(\textrm{G}_5\) or \(\textrm{G}_6\) for adversary \(\mathcal {D}_{\textrm{IND}}\), depending on the challenge bit in game \(\textrm{G}^\textsf{uprkprf}_{\textsf{MAC}, \phi _{\textsf{MAC}}, \mathcal {D}_{\textrm{UPRKPRF}}}\). It follows that

$$ \Pr [\textrm{G}_5] - \Pr [\textrm{G}_6] = \textsf{Adv}^{\textsf{uprkprf}}_{\textsf{MAC}, \phi _{\textsf{MAC}}}(\mathcal {D}_{\textrm{UPRKPRF}}). $$
Fig. 42
Fig. 42

Adversary \(\mathcal {D}_{\textrm{UPRKPRF}}\) against the \(\textrm{UPRKPRF}\)-security of \(\textsf{MAC}\) for the transition between games \(\textrm{G}_{5}\)\(\textrm{G}_{6}\)

\({{{\textbf{G}}}_{6}\rightarrow {{\textbf{G}}}_{7}}\). Games \(\textrm{G}_6\) and \(\textrm{G}_7\) are identical until \(\textsf{bad}_1\) is set; as above, we have

$$ \Pr [\textrm{G}_{6}] - \Pr [\textrm{G}_{7}] \le \Pr [\textsf{bad}_1^{\textrm{G}_6}]. $$

The values of \(\textsf{msg}{\_}\textsf{key}\in \{0,1\}^{\textsf{MAC}.\textsf{ol}}\) in game \(\textrm{G}_6\) are sampled uniformly at random and independently across the \(q_{\textsc {Ch}}\) different calls to oracle \(\textsc {Send}\), so we can apply the birthday bound to claim the following:

$$\begin{aligned} \Pr [\textsf{bad}_1^{\textrm{G}_6}] \le \frac{q_{\textsc {Ch}}\cdot (q_{\textsc {Ch}}- 1)}{2 \cdot 2^{\textsf{MAC}.\textsf{ol}}}. \end{aligned}$$

\({{{\textbf{G}}}_{7}\rightarrow {{\textbf{G}}}_{8}}\). In the transition from \(\textrm{G}_7\) to \(\textrm{G}_8\) (Fig. 40), we replace the value of ciphertext \(c_{\textit{se}}\) from \(\mathsf {\textsf{SE}.Enc}(k, p)\) to a uniformly random value from \(\{0,1\}^{\mathsf {\textsf{SE}.cl}(\mathsf {\textsf{ME}.pl}{(\left| m_b\right| , r)})}\) by appealing to the \(\mathrm {OTIND\$}\)-security of \(\textsf{SE}\) (Fig. 4). Recall that \(\mathsf {\textsf{ME}.pl}{(\left| m_b\right| , r)}\) is the length of the payload \(p\) that is produced by calling \(\mathsf {\textsf{ME}.Encode}\) on any message of length \(\left| m_b\right| \) and on random coins \(r\), whereas \(\mathsf {\textsf{SE}.cl}(\cdot )\) maps the payload length to the resulting ciphertext length when encrypted with \(\textsf{SE}\). In Fig. 43, we build an adversary \(\mathcal {D}_{\mathrm {OTIND\$}}\) attacking the \(\mathrm {OTIND\$}\)-security of \(\textsf{SE}\) that simulates \(\textrm{G}_7\) or \(\textrm{G}_8\) for adversary \(\mathcal {D}_{\textrm{IND}}\), depending on the challenge bit in game \(\textrm{G}^{\mathsf {otind\$}}_{\textsf{SE}, \mathcal {D}_{\mathrm {OTIND\$}}}\). It follows that

$$ \Pr [\textrm{G}_7] - \Pr [\textrm{G}_8] = \textsf{Adv}^{\mathsf {otind\$}}_{\textsf{SE}}(\mathcal {D}_{\mathrm {OTIND\$}}). $$
Fig. 43
Fig. 43

Adversary \(\mathcal {D}_{\mathrm {OTIND\$}}\) against the \(\mathrm {OTIND\$}\)-security of \(\textsf{SE}\) for the transition between games \(\textrm{G}_{7}\)\(\textrm{G}_{8}\)

Finally, the output of oracle \(\textsc {Ch}\) in game \(\textrm{G}_8\) no longer depends on the challenge bit b, so we have

$$ \Pr [\textrm{G}_8] = \frac{1}{2}. $$

The theorem statement follows. \(\square \)

5.5.1 Proof alternatives

Our security reduction relies on the \(\textrm{RKPRF}\)-security of \(\textsf{KDF}\) with respect to \(\phi _{\textsf{KDF}}\). We note that it would suffice to instead define and use a related-key weak-PRF notion here. It could be used in the penultimate step of this security reduction: right before appealing to the \(\mathrm {OTIND\$}\)-security of \(\textsf{SE}\).

Further, in this security reduction we consider a generic function family \(\textsf{MAC}\) and rely on it being related-key PRF-secure with respect to unique-prefix inputs. Recall that MTProto uses \(\textsf{MAC}= \textsf{MTP}\text {-}\textsf{MAC}\) such that \(\textsf{MTP}\text {-}\textsf{MAC}.\textsf{Ev}(\textit{mk}_\textit{u}\), \(p)\) \(=\) \(\textsf{SHA}-\textsf{256}(\textit{mk}_\textit{u}~\Vert ~p){[64:192]}\). It discards half of the \(\textsf{SHA}-\textsf{256}\) output bits, so we could alternatively model it as an instance of Augmented MAC (AMAC) and prove it to be related-key PRF-secure based on [10]. However, using the results from [10] would have required us to show that the \(\textsf{SHA}-\textsf{256}\) compression function is a secure PRF when half of its key is leaked to the adversary. We achieve a simpler and tighter security reduction by relying on the unique-prefix property of \(\textsf{ME}\) that is already guaranteed in MTProto.

5.6 \(\textrm{INT}\)-security of \(\textsf{MTP}\text {-}\textsf{CH}\)

The first half of our integrity proof shows that it is hard to forge ciphertexts; in order to justify this, we rely on security properties of the cryptographic primitives that are used to build the channel \(\textsf{MTP}\text {-}\textsf{CH} \) (i.e. \(\textsf{HASH}\), \(\textsf{KDF}\), \(\textsf{SE}\), and \(\textsf{MAC}\)). Once ciphertext forgery is ruled out, we are guaranteed that \(\textsf{MTP}\text {-}\textsf{CH} \) broadly matches an intuition of an authenticated channel: it prevents an attacker from modifying or creating its own ciphertexts but still allows to intercept and subsequently replay, reorder or drop honestly produced ciphertexts. So in the second part of the proof we show that the message encoding scheme \(\textsf{ME}\) appropriately resolves all of the possible adversarial interaction with an authenticated channel; formally, we require that it behaves according to the requirements that are specified by the support function \(\textsf{supp}= \textsf{supp}\text {-}\textsf{ord}\). Our main result is then:

Theorem 2

Let \(\textsf {session}\_\textsf {id}\in \{0,1\}^{64}\), \(\textsf {pb} \in {{\mathbb {N}}}\), and \(\textsf{bl}= 128\). Let \(\textsf{ME}= \textsf{MTP}\text {-}\textsf{ME}[\textsf {session}\_\textsf {id}, \textsf {pb}, \textsf{bl}]\) be the message encoding scheme as defined in Definition 6. Let \(\textsf{SE}= \textsf{MTP}\text {-}\textsf{SE}\) be the deterministic symmetric encryption scheme as defined in Definition 10. Let \(\textsf{HASH}\), \(\textsf{MAC}\), \(\textsf{KDF}\), \(\phi _{\textsf{MAC}}\), \(\phi _{\textsf{KDF}}\) be any primitives that, together with \(\textsf{ME}\) and \(\textsf{SE}\), meet the requirements stated in Definition 5 of channel \(\textsf{MTP}\text {-}\textsf{CH} \). Let \(\textsf{CH}= \textsf{MTP}\text {-}\textsf{CH} [\textsf{ME}, \textsf{HASH}, \textsf{MAC}, \textsf{KDF}, \phi _{\textsf{MAC}}, \phi _{\textsf{KDF}}, \textsf{SE}]\). Let \(\textsf{supp}= \textsf{supp}\text {-}\textsf{ord}\) be the support function as defined in Fig. 32. Let \(\mathcal {F}_{\textrm{INT}}\) be any adversary against the \(\textrm{INT}\)-security of \(\textsf{CH}\) with respect to \(\textsf{supp}\). Then, we can build adversaries \(\mathcal {D}_{\textrm{OTWIND}}\), \(\mathcal {D}_{\textrm{RKPRF}}\), \(\mathcal {F}_{\textrm{UNPRED}}\), \(\mathcal {F}_{\textrm{RKCR}}\), \(\mathcal {F}_{\textrm{EINT}}\) such that

$$\begin{aligned} \textsf{Adv}^{\textsf{int}}_{\textsf{CH}, \textsf{supp}}(\mathcal {F}_{\textrm{INT}})&\le \textsf{Adv}^{\textsf{otwind}}_{\textsf{HASH}}(\mathcal {D}_{\textrm{OTWIND}}) + \textsf{Adv}^{\textsf{rkprf}}_{\textsf{KDF}, \phi _{\textsf{KDF}}}(\mathcal {D}_{\textrm{RKPRF}})\\&\quad + \textsf{Adv}^{\textsf{unpred}}_{\textsf{SE}, \textsf{ME}}(\mathcal {F}_{\textrm{UNPRED}}) + \textsf{Adv}^{\textsf{rkcr}}_{\textsf{MAC}, \phi _{\textsf{MAC}}}(\mathcal {F}_{\textrm{RKCR}}) \\&\quad + \textsf{Adv}^{\textsf{eint}}_{\textsf{ME}, \textsf{supp}}(\mathcal {F}_{\textrm{EINT}}). \end{aligned}$$

Before providing the detailed proof, we provide some discussion of our approach and a high-level overview of the different parts of the proof.

5.6.1 Invisible terms based on correctness of \(\textsf{ME}\), \(\textsf{SE}\), \(\textsf{supp}\)

We state and prove our \(\textrm{INT}\)-security claim for channel \(\textsf{MTP}\text {-}\textsf{CH} \) with respect to fixed choices of MTProto-based constructions \(\textsf{ME}= \textsf{MTP}\text {-}\textsf{ME}\) (Definition 6) and \(\textsf{SE}= \textsf{MTP}\text {-}\textsf{SE}\) (Definition 10), and with respect to the support function \(\textsf{supp}= \textsf{supp}\text {-}\textsf{ord}\) that is defined in Fig. 32. Our security reduction relies on six correctness-style properties of these primitives: one for \(\textsf{ME}\), two for \(\textsf{SE}\), three for \(\textsf{supp}\). Each of them can be observed to be always true for the corresponding scheme and hence does not contribute an additional term to the advantage statement in Theorem 2. These properties are also simple enough that we chose not to define them in a game-based style (the one we require from \(\textsf{ME}\) is distinct from, and simpler than, the encoding correctness notion that we defined in Sect. 3.5). Our security reduction nonetheless introduces and justifies a game hop for each of the these properties. This necessitates the use of 14 security reduction games to prove Theorem 2, including some that are meant to be equivalent by observation (i.e. the corresponding game transitions do not rely on any correctness or security properties). However, some of the reduction steps require a detailed analysis.

Theorem 2 could be stated in a more general way, fully formalising the aforementioned correctness notions and phrasing our claims with respect to any \(\textsf{SE}\), \(\textsf{ME}\), \(\textsf{supp}\). We lose this generality by instantiating these primitives. Our motivation is twofold. On the one hand, we state our claims in a way that highlights the parts of MTProto (as captured by our specification) that are critical for its security analysis, and omit spending too much attention on parts of the reduction that can be “taken for granted”. On the other hand, our work studies MTProto, and the abstractions that we use are meant to simplify and aid this analysis. We discourage the reader from treating \(\textsf{MTP}\text {-}\textsf{CH} \) in a prescriptive way, e.g. from trying to instantiate it with different primitives to build a secure channel since standard, well-studied cryptographic protocols such as TLS already exist.

5.6.2 Proof phase I: Forging a ciphertext is hard

Let \(\mathcal {F}_{\textrm{INT}}\) be an adversary playing in the \(\textrm{INT}\)-security game against channel \(\textsf{MTP}\text {-}\textsf{CH} \). Consider an arbitrary call made by \(\mathcal {F}_{\textrm{INT}}\) to its oracle \(\textsc {Recv}\) on inputs \(\textit{u}, c, \textit{aux}\) such that \(c = (\textsf{auth}{\_}\textsf{key}{\_}\textsf{id}', \textsf{msg}{\_}\textsf{key}, c_{\textit{se}})\). The oracle evaluates \(\mathsf {\textsf{MTP}\text {-}\textsf{CH}.Recv}(\textit{st}_{\textit{u}}, c, \textit{aux})\). Recall that \(\mathsf {\textsf{MTP}\text {-}\textsf{CH}.Recv}\) attempts to verify \(\textsf{msg}{\_}\textsf{key}\) by checking whether \(\textsf{msg}{\_}\textsf{key} = \textsf{MAC}.\textsf{Ev}(\textit{mk}_{\overline{\textit{u}}}, p)\) for an appropriately recovered payload \(p\) (i.e. \(k \leftarrow \textsf{KDF}.\textsf{Ev}(\textit{kk}_{\overline{\textit{u}}}, \textsf{msg}{\_}\textsf{key})\) and \(p\leftarrow \mathsf {\textsf{SE}.Dec}(k, c_{\textit{se}})\)). If this \(\textsf{msg}{\_}\textsf{key}\) verification passes (and if \(\textsf{auth}{\_}\textsf{key}{\_}\textsf{id}' = \textsf{auth}{\_}\textsf{key}{\_}\textsf{id}\)), then \(\mathsf {\textsf{MTP}\text {-}\textsf{CH}.Recv}\) attempts to decode the payload by computing \((\textit{st}_{\textsf{ME}, \textit{u}}, m) \leftarrow \mathsf {\textsf{ME}.Decode}(\textit{st}_{\textsf{ME}, \textit{u}}, p, \textit{aux})\).

We consider two cases, and claim the following. (A) If \(\textsf{msg}{\_}\textsf{key}\) was not previously returned by oracle \(\textsc {Send}\) as a part of any ciphertext sent by user \(\overline{\textit{u}}\), then with high probability an evaluation of \(\mathsf {\textsf{ME}.Decode}(\textit{st}_{\textsf{ME}, \textit{u}}, p, \textit{aux})\) would return \(m = \bot \) regardless of whether the \(\textsf{msg}{\_}\textsf{key}\) verification passed or failed; so in this case we are not concerned with assessing the likelihood that the \(\textsf{msg}{\_}\textsf{key}\) verification passes. (B) If \(\textsf{msg}{\_}\textsf{key}\) was previously returned by oracle \(\textsc {Send}\) as a part of some ciphertext \(c' = (\textsf{auth}{\_}\textsf{key}{\_}\textsf{id}, \textsf{msg}{\_}\textsf{key}, c_{\textit{se}}')\) sent by user \(\overline{\textit{u}}\), and if \(\textsf{auth}{\_}\textsf{key}{\_}\textsf{id}= \textsf{auth}{\_}\textsf{key}{\_}\textsf{id}'\), then with high probability \(c_{\textit{se}}= c_{\textit{se}}'\) (and hence \(c = c'\)) whenever the \(\textsf{msg}{\_}\textsf{key}\) verification passes. We now justify both claims.

5.6.3 Case A. Assume \(\textsf{msg}{\_}\textsf{key}\) is fresh

Our analysis of this case will rely on a property of the symmetric encryption scheme \(\textsf{SE}\) and will require that its key k is chosen uniformly at random. Thus, we begin by invoking the \(\textrm{OTWIND}\)-security of \(\textsf{HASH}\) and the \(\textrm{RKPRF}\)-security of \(\textsf{KDF}\) in order to claim that the output of \(\textsf{KDF}.\textsf{Ev}(\textit{kk}_{\overline{\textit{u}}}, \textsf{msg}{\_}\textsf{key})\) is indistinguishable from random; this mirrors the first two steps of the \(\textrm{IND}\)-security reduction of \(\textsf{MTP}\text {-}\textsf{CH} \). We formalise this by requiring that \(\textsf{KDF}.\textsf{Ev}(\textit{kk}_{\overline{\textit{u}}}, \textsf{msg}{\_}\textsf{key})\) is indistinguishable from a uniformly random value stored in the PRF table’s entry \(\textsf{T}[\overline{\textit{u}}, \textsf{msg}{\_}\textsf{key}]\).

Our analysis of Case A now reduces roughly to the following: we need to show that it is hard to find any \(\textsf{SE}\) ciphertext \(c_{\textit{se}}\) such that its decryption \(p\) under a uniformly random key k has a non-negligible chance of being successfully decoded by \(\mathsf {\textsf{ME}.Decode}\) (i.e. returning \(m\ne \bot \)). As a part of this experiment, the adversary is allowed to query many different values of \(\textsf{msg}{\_}\textsf{key}\) and \(c_{\textit{se}}\) (recall that an \(\textsf{MTP}\text {-}\textsf{CH} \) ciphertext contains both). At this point, the \(\textsf{msg}{\_}\textsf{key}\) is only used to select a uniformly random \(\textsf{SE}\) key k from \(\textsf{T}[\overline{\textit{u}}, \textsf{msg}{\_}\textsf{key}]\), but the adversary can reuse the same key k in combination with many different choices of \(c_{\textit{se}}\). The Case A assumption that \(\textsf{msg}{\_}\textsf{key}\) is “fresh” means that the \(\textsf{msg}{\_}\textsf{key}\) was not seen during previous calls to the \(\textsc {Send}\) oracle, so the adversary has no additional leakage on key k. All of the above is captured by the notion of \(\textsf{SE}\)’s unpredictability (\(\textrm{UNPRED}\)) with respect to \(\textsf{ME}\) (Sect. 5.3).

The \(\textrm{UNPRED}\)-security of \(\textsf{SE}, \textsf{ME}\) can be trivially broken if \(\mathsf {\textsf{ME}.Decode}\) is defined in a way that it successfully decodes every possible payload \(p\in \mathsf {\textsf{ME}.Out}\). It can also be trivially broken for contrived examples of \(\textsf{SE}\) like the one defining \(\forall k\in \{0,1\}^{\mathsf {\textsf{SE}.kl}}, \forall x\in \mathsf {\textsf{SE}.MS}:(\mathsf {\textsf{SE}.Enc}(k, x) = x) \wedge (\mathsf {\textsf{SE}.Dec}(k, x) = x)\), assuming that \(\mathsf {\textsf{ME}.Decode}\) can successfully decode even a single payload \(p\) from \(\mathsf {\textsf{SE}.MS}\). But the more structure \(\mathsf {\textsf{ME}.Decode}\) requires from its input \(p\), and the more “unpredictable” is the decryption algorithm \(\mathsf {\textsf{SE}.Dec}(k, \cdot )\) for a uniformly random k, the harder it is to break the \(\textrm{UNPRED}\)-security of \(\textsf{SE}, \textsf{ME}\). We note that \(\textsf{MTP}\text {-}\textsf{ME}\) requires every \(p\) to contain a constant \(\textsf {session}\_\textsf {id}\in \{0,1\}^{64}\) in the second half of its 128-bit block, whereas \(\textsf{MTP}\text {-}\textsf{SE}\) implements the IGE block cipher mode of operation. In Appendix E.6, we show that the output \(p\) of \(\mathsf {\textsf{MTP}\text {-}\textsf{SE}.Dec}\) is highly unlikely to contain \(\textsf {session}\_\textsf {id}\) at the necessary position, i.e. if \(\mathcal {F}_{\textrm{INT}}\) makes \(q_{\textsc {Send}}\) queries to its \(\textsc {Send}\) oracle then it can find such \(p\) with probability at most \(q_{\textsc {Send}}/2^{64}\). In Appendix E.6, we also discuss the possibility of improving this bound.

5.6.4 Case B. Assume \(\textsf{msg}{\_}\textsf{key}\) is reused

In this case, we know that adversary \(\mathcal {F}_{\textrm{INT}}\) previously called its \(\textsc {Send}\) oracle on inputs \(\overline{\textit{u}}, m', \textit{aux}', r'\) for some \(m', \textit{aux}', r'\), and received back a ciphertext \(c' = (\textsf{auth}{\_}\textsf{key}{\_}\textsf{id}, \textsf{msg}{\_}\textsf{key}', c_{\textit{se}}')\) such that \(\textsf{msg}{\_}\textsf{key}' = \textsf{msg}{\_}\textsf{key}\). Let \(p'\) be the payload that was built and used inside this oracle call. Recall that we are currently considering \(\mathcal {F}_{\textrm{INT}}\)’s ongoing call to its oracle \(\textsc {Recv}\) on inputs \(\textit{u}, c, \textit{aux}\) such that \(c = (\textsf{auth}{\_}\textsf{key}{\_}\textsf{id}', \textsf{msg}{\_}\textsf{key}, c_{\textit{se}})\); we are only interested in the event that the \(\textsf{msg}{\_}\textsf{key}\) verification passed (and that \(\textsf{auth}{\_}\textsf{key}{\_}\textsf{id}= \textsf{auth}{\_}\textsf{key}{\_}\textsf{id}'\)), meaning that \(\textsf{msg}{\_}\textsf{key}= \textsf{MAC}.\textsf{Ev}(\textit{mk}_{\overline{\textit{u}}}, p)\) holds for an appropriately recovered \(p\).

It follows that \(\textsf{MAC}.\textsf{Ev}(\textit{mk}_{\overline{\textit{u}}}, p') = \textsf{MAC}.\textsf{Ev}(\textit{mk}_{\overline{\textit{u}}}, p)\). If \(p' \ne p\), then this breaks the \(\textrm{RKCR}\)-security of \(\textsf{MAC}\). Recall that MTProto instantiates \(\textsf{MAC}\) with \(\textsf{MTP}\text {-}\textsf{MAC}\) where \(\textsf{MTP}\text {-}\textsf{MAC}.\textsf{Ev}(\textit{mk}_\textit{u}, p) = \textsf{SHA}-\textsf{256}(\textit{mk}_\textit{u}~\Vert ~p){[64:192]}\). So this attack against \(\textsf{MAC}\) reduces to breaking some variant of \(\textsf{SHA}-\textsf{256}\)’s collision resistance that restricts the set of allowed inputs but only requires to find a collision in a 128-bit fragment of the output.

Based on the above, we obtain \((\textsf{msg}{\_}\textsf{key}', p') = (\textsf{msg}{\_}\textsf{key}, p)\). Let \(k = \textsf{KDF}.\textsf{Ev}(\textit{kk}_{\overline{\textit{u}}}, \textsf{msg}{\_}\textsf{key})\). Note that \(c_{\textit{se}}' \leftarrow \mathsf {\textsf{SE}.Enc}(k, p')\) was computed during the \(\textsc {Send}\) call, and \(p\leftarrow \mathsf {\textsf{SE}.Dec}(k, c_{\textit{se}})\) was computed during the ongoing \(\textsc {Recv}\) call. The equality \(p' = p\) implies \(c_{\textit{se}}' = c_{\textit{se}}\) if \(\textsf{SE}\) guarantees that for any key k, the algorithms of \(\textsf{SE}\) match every message \(p\in \mathsf {\textsf{SE}.MS}\) with a unique ciphertext \(c_{\textit{se}}\). When this condition holds, we say that \(\textsf{SE}\) has unique ciphertexts. We note that \(\textsf{MTP}\text {-}\textsf{SE}\) satisfies this property; it follows that \(c_{\textit{se}}' = c_{\textit{se}}\) and therefore the \(\textsf{MTP}\text {-}\textsf{CH} \) ciphertext c that was queried to \(\textsc {Recv}\) (for user \(\textit{u}\)) is equal to the ciphertext \(c'\) that was previously returned by \(\textsc {Send}\) (by user \(\overline{\textit{u}}\)). Implicit in this argument is an assumption that \(\textsf{SE}\) has the decryption correctness property; \(\textsf{MTP}\text {-}\textsf{SE}\) satisfies this property as well.

5.6.5 Proof phase II: \(\textsf{MTP}\text {-}\textsf{CH} \) acts as an authenticated channel

We can rewrite the claims we stated and justified in the first phase of the proof as follows. When adversary \(\mathcal {F}_{\textrm{INT}}\) queries its oracle \(\textsc {Recv}\) on inputs \(\textit{u}, c, \textit{aux}\), the channel decrypts c to \(m = \bot \) with high probability, unless c was honestly returned in response to \(\mathcal {F}_{\textrm{INT}}\)’s prior call to \(\textsc {Send}(\overline{\textit{u}}, \ldots )\), meaning \(\exists m', \textit{aux}' :(\textsf{sent}, m', c, \textit{aux}') \in \textsf{tr}_{\overline{\textit{u}}}\). Furthermore, we claim that the channel’s state \(\textit{st}_{\textit{u}}\) of user \(\textit{u}\) does not change when \(\mathcal {F}_{\textrm{INT}}\) queries its oracle \(\textsc {Recv}\) on inputs \(\textit{u}, c, \textit{aux}\) that get decrypted to \(m = \bot \). This could only happen in Case A above, assuming that the \(\textsf{msg}{\_}\textsf{key}\) verification succeeds but then the \(\mathsf {\textsf{ME}.Decode}\) call returns \(m = \bot \) and changes the message encoding scheme’s state \(\textit{st}_{\textsf{ME}, \textit{u}}\) of user \(\textit{u}\). We note that \(\textsf{MTP}\text {-}\textsf{ME}\) never updates \(\textit{st}_{\textsf{ME}, \textit{u}}\) when decoding fails, and hence, it satisfies this requirement.

We now know that oracle \(\textsc {Recv}\) accepts only honestly forwarded ciphertexts from the opposite user and that it never changes the channel’s state otherwise. This allows us to rewrite the \(\textrm{INT}\)-security game to ignore all cryptographic algorithms in the \(\textsc {Recv}\) oracle. More specifically, oracle \(\textsc {Recv}\) can use the opposite user’s transcript to check which ciphertexts were produced honestly, and simply reject the ones that are not on this transcript. For each ciphertext c that is on the transcript, the game can maintain a table that maps it to the payload \(p\) that was used to generate it; oracle \(\textsc {Recv}\) can fetch this payload and immediately call \(\mathsf {\textsf{ME}.Decode}\) to decode it.

5.6.6 Proof phase III: Interaction between \(\textsf{ME}\) and \(\textsf{supp}\)

By now, we have transformed our \(\textrm{INT}\)-security game to an extent that it roughly captures the requirement that the behaviour of \(\textsf{ME}\) should match that of \(\textsf{supp}\) (i.e. adversary \(\mathcal {F}_{\textrm{INT}}\) wins the game iff the message m recovered by \(\mathsf {\textsf{ME}.Decode}\) inside oracle \(\textsc {Recv}\) is not equal to the corresponding output \(m^*\) of \(\textsf{supp}\)). However, the support function \(\textsf{supp}\) uses the \(\textsf{MTP}\text {-}\textsf{CH} \) encryption c of payload \(p\) as its label, and it is not necessarily clear what information about c can or should be used to define the behaviour of \(\textsf{supp}\). In order to simplify the security game we have arrived to, we will rely on three correctness-style notions as follows:

  1. (1)

    Integrity of a support function requires that the support function returns \(m^*= \bot \) when it is called on a ciphertext that cannot be found in the opposite user’s transcript \(\textsf{tr}_{\overline{\textit{u}}}\).Footnote 28

  2. (2)

    Robustness of a support function requires that adding failed decryption events (i.e. \(m = \bot \)) to a transcript does not affect the future outputs of \(\textsf{supp}\) on any inputs.

  3. (3)

    We also rely on a property requiring that a support function uses no information about its labels beyond their equality pattern, separately for either direction of communication (i.e. \(\textit{u}\rightarrow \overline{\textit{u}}\) and \(\overline{\textit{u}}\rightarrow \textit{u}\)).

For the last property, we observe that in our game \(p_0 = p_1\) iff the corresponding \(\textsf{MTP}\text {-}\textsf{CH} \) ciphertexts are also equal. This allows us to switch from using ciphertexts to using payloads as the labels for the \(\textsf{supp}\), and simultaneously change the transcripts to also store payloads instead of ciphertexts. Our theorem is stated with respect to \(\textsf{supp}= \textsf{supp}\text {-}\textsf{ord}\) that satisfies all three of the above properties.

The introduced properties of a support function allow us to further simplify the \(\textrm{INT}\)-security game. This helps us to remove the corner case that deals with \(\textsc {Recv}\) being queried on an invalid ciphertext (i.e. one that was not honestly forwarded). And finally this lets us reduce our latest version of the \(\textrm{INT}\)-security game for \(\textsf{MTP}\text {-}\textsf{CH} \) to the encoding integrity (\(\textrm{EINT}\)) property of \(\textsf{ME}, \textsf{supp}\) (see Fig. 16) that is defined to match \(\textsf{ME}\) against \(\textsf{supp}\) in the presence of adversarial behaviour on an authenticated channel that exchanges \(\textsf{ME}\) payloads between two users. In Appendix E.5, we show that this property holds for \(\textsf{MTP}\text {-}\textsf{ME}\) with respect to \(\textsf{supp}\text {-}\textsf{ord}\).

Proof of Theorem 2

This proof uses games \(\textrm{G}_0\)\(\textrm{G}_2\) in Fig. 44, games \(\textrm{G}_3\)\(\textrm{G}_{8}\) in Fig. 46 and games \(\textrm{G}_9\)\(\textrm{G}_{13}\) in Fig. 49. The code added for the transitions between games is highlighted in . The adversaries for transitions between games are provided throughout the proof. The instructions that are inside adversaries mark the changes in the code of the simulated security reduction games.

Fig. 44
Fig. 44

Games \(\textrm{G}_0\)\(\textrm{G}_2\) for the proof of Theorem 2. The code added by expanding the algorithms of \(\textsf{CH}\) in game \(\textrm{G}^{\textsf{int}}_{\textsf{CH}, \textsf{supp}, \mathcal {F}_{\textrm{INT}}}\) is highlighted in

Games \(\textrm{G}_0\)\(\textrm{G}_2\) and the transitions between them (\(\textrm{G}_{0}\rightarrow \textrm{G}_{1}\) based on the \(\textrm{OTWIND}\)-security of \(\textsf{HASH}\), and \(\textrm{G}_{1}\rightarrow \textrm{G}_{2}\) based on the \(\textrm{RKPRF}\)-security of \(\textsf{KDF}\)) are very similar to the corresponding games and transitions in our \(\textrm{IND}\)-security reduction. We refer to the proof of Theorem 1 for a detailed explanation of both transitions.

Fig. 45
Fig. 45

The adversaries for games \(\textrm{G}_0\)\(\textrm{G}_2\) of the proof of Theorem 2. Each constructed adversary simulates one or two subsequent games of the security reduction for adversary \(\mathcal {F}_{\textrm{INT}}\)

\({\textbf{G}}_{0}\). Game \(\textrm{G}_0\) is equivalent to game \(\textrm{G}^{\textsf{int}}_{\textsf{CH}, \textsf{supp}, \mathcal {F}_{\textrm{INT}}}\). It expands the code of algorithms \(\mathsf {\textsf{CH}.Init}\), \(\mathsf {\textsf{CH}.Send}\) and \(\mathsf {\textsf{CH}.Recv}\). The expanded instructions are highlighted in . It follows that

$$ \textsf{Adv}^{\textsf{int}}_{\textsf{CH}, \textsf{supp}}(\mathcal {F}_{\textrm{INT}}) = \Pr [\textrm{G}_0]. $$

\({{{\textbf{G}}}_{0}\rightarrow {{\textbf{G}}}_{1}}\). The value of \(\textsf{auth}{\_}\textsf{key}{\_}\textsf{id}\) in game \(\textrm{G}_0\) depends on the initial \(\textsf{KDF}\) key \(\textit{kk}\) and \(\textsf{MAC}\) key \(\textit{mk}\). In contrast, game \(\textrm{G}_1\) computes \(\textsf{auth}{\_}\textsf{key}{\_}\textsf{id}\) by evaluating \(\textsf{HASH}\) on a uniformly random input \(x\) that is independent of \(\textit{kk}\) and \(\textit{mk}\). We invoke the \(\textrm{OTWIND}\)-security of \(\textsf{HASH}\) (Fig. 25) in order to claim that adversary \(\mathcal {F}_{\textrm{INT}}\) cannot distinguish between playing in \(\textrm{G}_0\) and \(\textrm{G}_1\). In Fig. 45a, we build an adversary \(\mathcal {D}_{\textrm{OTWIND}}\) against the \(\textrm{OTWIND}\)-security of \(\textsf{HASH}\). When adversary \(\mathcal {D}_{\textrm{OTWIND}}\) plays in game \(\textrm{G}^\textsf{otwind}_{\textsf{HASH}, \mathcal {D}_{\textrm{OTWIND}}}\) with challenge bit \(d\in \{0,1\}\), it simulates game \(\textrm{G}_0\) (when \(d=1\)) or game \(\textrm{G}_1\) (when \(d=0\)) for adversary \(\mathcal {F}_{\textrm{INT}}\). Adversary \(\mathcal {D}_{\textrm{OTWIND}}\) returns \(d'=1\) iff \(\mathcal {F}_{\textrm{INT}}\) sets \(\textsf{win}\), so we have

$$ \Pr [\textrm{G}_0] - \Pr [\textrm{G}_1] = \textsf{Adv}^{\textsf{otwind}}_{\textsf{HASH}}(\mathcal {D}_{\textrm{OTWIND}}). $$

\({{{\textbf{G}}}_{1}\rightarrow {{\textbf{G}}}_{2}}\). Going from \(\textrm{G}_1\) to \(\textrm{G}_2\), we switch the outputs of \(\textsf{KDF}.\textsf{Ev}\) to uniformly random values. Since the adversary can call \(k \leftarrow \textsf{KDF}.\textsf{Ev}(\textit{kk}_{\textit{u}}, \textsf{msg}{\_}\textsf{key})\) on the same inputs multiple times, we use a PRF table \(\textsf{T}\) to enforce the consistency between calls; the output of \(\textsf{KDF}.\textsf{Ev}(\textit{kk}_{\textit{u}}, \textsf{msg}{\_}\textsf{key})\) in \(\textrm{G}_1\) corresponds to a uniformly random value that is sampled and stored in the table entry \(\textsf{T}[\textit{u}, \textsf{msg}{\_}\textsf{key}]\). In Fig. 45b, we build an adversary \(\mathcal {D}_{\textrm{RKPRF}}\) against the \(\textrm{RKPRF}\)-security of \(\textsf{KDF}\) (Fig. 26) with respect to \(\phi _{\textsf{KDF}}\). When adversary \(\mathcal {D}_{\textrm{RKPRF}}\) plays in game \(\textrm{G}^\textsf{rkprf}_{\textsf{KDF}, \phi _{\textsf{KDF}}, \mathcal {D}_{\textrm{RKPRF}}}\) with challenge bit \(d\in \{0,1\}\), it simulates game \(\textrm{G}_1\) (when \(d=1\)) or game \(\textrm{G}_2\) (when \(d=0\)) for adversary \(\mathcal {F}_{\textrm{INT}}\). Adversary \(\mathcal {D}_{\textrm{RKPRF}}\) returns \(d'=1\) iff \(\mathcal {F}_{\textrm{INT}}\) sets \(\textsf{win}\), so we have

$$ \Pr [\textrm{G}_1] - \Pr [\textrm{G}_2] = \textsf{Adv}^{\textsf{rkprf}}_{\textsf{KDF}, \phi _{\textsf{KDF}}}(\mathcal {D}_{\textrm{RKPRF}}). $$
Fig. 46
Fig. 46

Games \(\textrm{G}_3\)\(\textrm{G}_{8}\) for the proof of Theorem 2

\({{{\textbf{G}}}_{2}\rightarrow {{\textbf{G}}}_{3}}\). Game \(\textrm{G}_3\) (Fig. 46) differs from \(\textrm{G}_2\) (Fig. 44) in the following ways:

  1. (1)

    The \(\textsf{KDF}\) keys \(\textit{kk}\), \(\textit{kk}_\mathcal {I}\), \(\textit{kk}_\mathcal {R}\) are no longer used in our reduction games starting from \(\textrm{G}_2\), so they are not included in game \(\textrm{G}_3\) and onwards.

  2. (2)

    Game \(\textrm{G}_3\) adds a table \(\textsf{S}\) that is updated during each call to oracle \(\textsc {Send}\). We set \(\textsf{S}[\textit{u}, \textsf{msg}{\_}\textsf{key}] \leftarrow (p, c_{\textit{se}})\) to remember that user \(\textit{u}\) produced \(\textsf{msg}{\_}\textsf{key}\) when sending (to user \(\overline{\textit{u}}\)) an \(\textsf{SE}\) ciphertext \(c_{\textit{se}}\), that encrypts payload \(p\).

  3. (3)

    Oracle \(\textsc {Recv}\) in game \(\textrm{G}_3\), prior to calling \(\mathsf {\textsf{ME}.Decode}\), now saves a backup copy of \(\textit{st}_{\textsf{ME}, \textit{u}}\) in variable \(\textit{st}_{\textsf{ME}, \textit{u}}^*\). It then adds four new conditional statements that do not serve any purpose in game \(\textrm{G}_3\). Four of the future game transitions in our security reduction (\(\textrm{G}_{3}\rightarrow \textrm{G}_{4}\), \(\textrm{G}_{4}\rightarrow \textrm{G}_{5}\), \(\textrm{G}_{5}\rightarrow \textrm{G}_{6}\), \(\textrm{G}_{7}\rightarrow \textrm{G}_{8}\)) will do the following. Each of them will add an instruction, inside the corresponding conditional statement, that reverts the pair of variables \((\textit{st}_{\textsf{ME}, \textit{u}}, m)\) to their initial values \((\textit{st}_{\textsf{ME}, \textit{u}}^*, \bot )\) that they had at the beginning of the ongoing \(\textsc {Recv}\) oracle call. Each of the new conditional statements also contains its own \(\textsf{bad}\) flag; these flags are only used for the formal analysis that we provide below.

  4. (4)

    Similar to the above, game \(\textrm{G}_3\) adds two conditional statements to the \(\textsc {Send}\) oracle, and both serve no purpose in game \(\textrm{G}_3\). In future games, they will be used to roll back the message encoding scheme’s state \(\textit{st}_{\textsf{ME}, \textit{u}}\) to its initial value that it had at the beginning of the ongoing \(\textsc {Send}\) oracle call, followed by exiting this oracle call with \(\bot \) as output.

Games \(\textrm{G}_3\) and \(\textrm{G}_2\) are functionally equivalent, so

$$ \Pr [\textrm{G}_3] = \Pr [\textrm{G}_2]. $$

\({{{\textbf{G}}}_{3}\rightarrow {{\textbf{G}}}_{4}}\). Games \(\textrm{G}_3\) and \(\textrm{G}_4\) (Fig. 46) are identical until \(\textsf{bad}_0\) is set. We have

$$ \Pr [\textrm{G}_3] - \Pr [\textrm{G}_4] \le \Pr [\textsf{bad}_0^{\textrm{G}_3}]. $$

The \(\textsf{bad}_0\) flag can be set in \(\textrm{G}_3\) only when the instruction that sets \((\textit{st}_{\textsf{ME}, \textit{u}}, m) \leftarrow \mathsf {\textsf{ME}.Decode}(\textit{st}_{\textsf{ME}, \textit{u}}, p, \textit{aux})\) simultaneously changes the value of \(\textit{st}_{\textsf{ME}, \textit{u}}\) and returns \(m = \bot \). Recall that the statement of Theorem 2 restricts \(\textsf{ME}\) to an instantiation of \(\textsf{MTP}\text {-}\textsf{ME}\). But the latter never modifies its state \(\textit{st}_{\textsf{ME}, \textit{u}}\) when the decoding fails (i.e. \(m = \bot \)), so

$$ \Pr [\textsf{bad}_0^{\textrm{G}_3}] = 0. $$
Fig. 47
Fig. 47

Adversary \(\mathcal {F}_{\textrm{UNPRED}}\) against the \(\textrm{UNPRED}\)-security of \(\textsf{SE}, \textsf{ME}\) for the transition between games \(\textrm{G}_4\)\(\textrm{G}_5\)

\({{{\textbf{G}}}_{4}\rightarrow {{\textbf{G}}}_{5}}\). Games \(\textrm{G}_4\) and \(\textrm{G}_5\) (Fig. 46) are identical until \(\textsf{bad}_1\) is set. We have

$$ \Pr [\textrm{G}_4] - \Pr [\textrm{G}_5] \le \Pr [\textsf{bad}_1^{\textrm{G}_5}]. $$

When the \(\textsf{bad}_1\) flag is set in \(\textrm{G}_5\), we know that the \(\textsf{SE}\) key \(k = \textsf{T}[\overline{\textit{u}}, \textsf{msg}{\_}\textsf{key}]\) was sampled uniformly at random and never used inside the \(\textsc {Send}\) oracle before (because \(\textsf{S}[\overline{\textit{u}}, \textsf{msg}{\_}\textsf{key}] = \bot \)). Yet the adversary \(\mathcal {F}_{\textrm{INT}}\) found an \(\textsf{SE}\) ciphertext \(c_{\textit{se}}\) such that the payload \(p\leftarrow \mathsf {\textsf{SE}.Dec}(k, c_{\textit{se}})\) was successfully decoded by \(\mathsf {\textsf{ME}.Decode}\) (i.e. \(m\ne \bot \)). We note that \(\mathcal {F}_{\textrm{INT}}\) is allowed to query its \(\textsc {Recv}\) oracle on arbitrarily many ciphertexts \(c_{\textit{se}}\) with respect to the same \(\textsf{SE}\) key k, by repeatedly using the same pair of values for \((\overline{\textit{u}}, \textsf{msg}{\_}\textsf{key})\). But it might nonetheless be hard for \(\mathcal {F}_{\textrm{INT}}\) to obtain a decodable payload p if (1) the outputs of function \(\mathsf {\textsf{SE}.Dec}(k, \cdot )\) are sufficiently “unpredictable” for an unknown uniformly random k, and (2) the \(\mathsf {\textsf{ME}.Decode}\) algorithm is sufficiently “restrictive” (e.g. designed to run some sanity checks on its payloads, hence rejecting a fraction of them). We use the unpredictability notion of \(\textsf{SE}\) with respect to \(\textsf{ME}\), which captures this intuition. In Fig. 47, we build an adversary \(\mathcal {F}_{\textrm{UNPRED}}\) against the \(\textrm{UNPRED}\)-security of \(\textsf{SE},\textsf{ME}\) (Fig. 35) as follows. When adversary \(\mathcal {F}_{\textrm{UNPRED}}\) plays in game \(\textrm{G}^{\textsf{unpred}}_{\textsf{SE}, \textsf{ME}, \mathcal {F}_{\textrm{UNPRED}}}\), it simulates game \(\textrm{G}_5\) for adversary \(\mathcal {F}_{\textrm{INT}}\). Adversary \(\mathcal {F}_{\textrm{UNPRED}}\) wins in its own game whenever \(\mathcal {F}_{\textrm{INT}}\) sets \(\textsf{bad}_1\), so we have

$$ \Pr [\textsf{bad}_1^{\textrm{G}_5}] \le \textsf{Adv}^{\textsf{unpred}}_{\textsf{SE}, \textsf{ME}}(\mathcal {F}_{\textrm{UNPRED}}). $$

We now explain the ideas behind the construction of \(\mathcal {F}_{\textrm{UNPRED}}\). Adversary \(\mathcal {F}_{\textrm{UNPRED}}\) does not maintain its own transcripts \(\textsf{tr}_{\textit{u}}, \textsf{tr}_{\overline{\textit{u}}}\) and hence does not evaluate the support function \(\textsf{supp}\) at the end of the simulated \(\textsc {Recv}\) oracle. This is because \(\textsf{supp}\)’s outputs do not affect the input–output behaviour of the simulated oracles \(\textsc {Send}\) and \(\textsc {Recv}\), and because this reduction step does not rely on whether adversary \(\mathcal {F}_{\textrm{INT}}\) manages to win in the simulated game (but rather only whether it sets \(\textsf{bad}_1\)). Some of the adversaries we construct for the next reduction steps will likewise not maintain the transcripts.

Adversary \(\mathcal {F}_{\textrm{UNPRED}}\) splits the simulation of game \(\textrm{G}_5\)’s \(\textsc {Recv}\) oracle into two cases:

  1. (1)

    If \(\textsf{S}[\overline{\textit{u}}, \textsf{msg}{\_}\textsf{key}] = \bot \), then \(\mathcal {F}_{\textrm{UNPRED}}\) does not modify \(\textit{st}_{\textsf{ME}, \textit{u}}\); this is consistent with the behaviour of oracle \(\textsc {Recv}\) in game \(\textrm{G}_5\). In addition, adversary \(\mathcal {F}_{\textrm{UNPRED}}\) also makes a call to its oracle \(\textsc {Ch}\). The \(\textsc {Ch}\) oracle simulates all instructions that would have been evaluated by \(\textsc {Recv}\) when \(\textsf{S}[\overline{\textit{u}}, \textsf{msg}{\_}\textsf{key}] = \bot \), except it omits the condition checking \((\textsf{msg}{\_}\textsf{key}' = \textsf{msg}{\_}\textsf{key}) \wedge (\textsf{auth}{\_}\textsf{key}{\_}\textsf{id}= \textsf{auth}{\_}\textsf{key}{\_}\textsf{id}')\). The omitted condition is a prerequisite to setting flag \(\textsf{bad}_1\) in game \(\textrm{G}_5\); this change is fine because adversary \(\mathcal {F}_{\textrm{UNPRED}}\) will nonetheless set the \(\textsf{win}\) flag in its game \(\textrm{G}^{\textsf{unpred}}_{\textsf{SE}, \textsf{ME}, \mathcal {F}_{\textrm{UNPRED}}}\) whenever the simulated adversary \(\mathcal {F}_{\textrm{INT}}\) would have set the \(\textsf{bad}_1\) flag in \(\textrm{G}_5\).

  2. (2)

    If \(\textsf{S}[\overline{\textit{u}}, \textsf{msg}{\_}\textsf{key}] \ne \bot \), then \(\mathcal {F}_{\textrm{UNPRED}}\) honestly simulates all instructions that would have been evaluated by \(\textsc {Recv}\).

Finally, adversary \(\mathcal {F}_{\textrm{UNPRED}}\) uses its \(\textsc {Expose}\) oracle to learn the values from the PRF table that is maintained by the \(\textrm{UNPRED}\)-security game, and synchronises them with its own PRF table \(\textsf{T}\) inside the simulated oracle \(\textsc {Send}\) (intuitively, this appears unnecessary, but it helps us avoid further analysis to show that \(\mathcal {F}_{\textrm{UNPRED}}\) perfectly simulates game \(\textrm{G}_5\)).

\({{{\textbf{G}}}_{5}\rightarrow {{\textbf{G}}}_{6}}\). Games \(\textrm{G}_5\) and \(\textrm{G}_6\) (Fig. 46) are identical until \(\textsf{bad}_2\) is set. We have

$$ \Pr [\textrm{G}_5] - \Pr [\textrm{G}_6] \le \Pr [\textsf{bad}_2^{\textrm{G}_5}]. $$

Game \(\textrm{G}_5\) sets the \(\textsf{bad}_2\) flag in two different places: one inside oracle \(\textsc {Send}\), and one inside oracle \(\textsc {Recv}\). In either case, this happens when the table entry \(\textsf{S}[w, \textsf{msg}{\_}\textsf{key}] = (p', c_{\textit{se}}')\), for some \(w\in \{\mathcal {I},\mathcal {R}\}\), indicates that a prior call to oracle \(\textsc {Send}\) obtained \(\textsf{msg}{\_}\textsf{key}\leftarrow \textsf{MAC}.\textsf{Ev}(\textit{mk}_{w}, p')\), and now we found \(p\) such that \(p\ne p'\) and \(\textsf{msg}{\_}\textsf{key}= \textsf{MAC}.\textsf{Ev}(\textit{mk}_{w}, p)\). This results in a collision for \(\textsf{MAC}\) under related keys and hence breaks its \(\textrm{RKCR}\)-security (Fig. 27) with respect to \(\phi _{\textsf{MAC}}\). In Fig. 48, we build an adversary \(\mathcal {F}_{\textrm{RKCR}}\) against the \(\textrm{RKCR}\)-security of \(\textsf{MAC}\) with respect to \(\phi _{\textsf{MAC}}\) as follows. When adversary \(\mathcal {F}_{\textrm{RKCR}}\) plays in game \(\textrm{G}^{\textsf{rkcr}}_{\textsf{MAC}, \phi _{\textsf{MAC}}, \mathcal {F}_{\textrm{RKCR}}}\), it simulates game \(\textrm{G}_5\) for adversary \(\mathcal {F}_{\textrm{INT}}\). Adversary \(\mathcal {F}_{\textrm{RKCR}}\) wins in its own game whenever \(\mathcal {F}_{\textrm{INT}}\) sets \(\textsf{bad}_2\), so we have

$$ \Pr [\textsf{bad}_2^{\textrm{G}_5}] \le \textsf{Adv}^{\textsf{rkcr}}_{\textsf{MAC}, \phi _{\textsf{MAC}}}(\mathcal {F}_{\textrm{RKCR}}). $$
Fig. 48
Fig. 48

Adversary \(\mathcal {F}_{\textrm{RKCR}}\) against the \(\textrm{RKCR}\)-security of \(\textsf{MAC}\) for the transition between games \(\textrm{G}_5\)\(\textrm{G}_6\)

\({{{\textbf{G}}}_{6}\rightarrow {{\textbf{G}}}_{7}}\). Games \(\textrm{G}_6\) and \(\textrm{G}_7\) (Fig. 46) are identical until \(\textsf{bad}_3\) is set. We have

$$ \Pr [\textrm{G}_6] - \Pr [\textrm{G}_7] \le \Pr [\textsf{bad}_3^{\textrm{G}_6}]. $$

If \(\textsf{bad}_3\) is set in \(\textrm{G}_6\), it means that adversary \(\mathcal {F}_{\textrm{INT}}\) found a payload \(p\) and an \(\textsf{SE}\) key \(k \in \{0,1\}^{\mathsf {\textsf{SE}.kl}}\) such that \(\mathsf {\textsf{SE}.Dec}(k, \mathsf {\textsf{SE}.Enc}(k, p)) \ne p\). This violates the decryption correctness of \(\textsf{SE}\). Recall that the statement of Theorem 2 considers \(\textsf{SE}= \textsf{MTP}\text {-}\textsf{SE}\). The \(\textsf{MTP}\text {-}\textsf{SE}\) scheme satisfies decryption correctness, so

$$ \Pr [\textsf{bad}_3^{\textrm{G}_6}] = 0. $$

\({{{\textbf{G}}}_{7}\rightarrow {{\textbf{G}}}_{8}}\). Games \(\textrm{G}_7\) and \(\textrm{G}_8\) (Fig. 46) are identical until \(\textsf{bad}_4\) is set. We have

$$ \Pr [\textrm{G}_7] - \Pr [\textrm{G}_8] \le \Pr [\textsf{bad}_4^{\textrm{G}_7}]. $$

Whenever \(\textsf{bad}_4\) is set in game \(\textrm{G}_7\), we know that (1) \(p\leftarrow \mathsf {\textsf{SE}.Dec}(k, c_{\textit{se}})\) was computed during the ongoing \(\textsc {Recv}\) call, and (2) \(c_{\textit{se}}' \leftarrow \mathsf {\textsf{SE}.Enc}(k, p)\) was computed during an earlier call to \(\textsc {Send}\), which also verified that \(\mathsf {\textsf{SE}.Dec}(k, c_{\textit{se}}') = p\). Importantly, we also know that \(c_{\textit{se}}\ne c_{\textit{se}}'\). The statement of Theorem 2 considers \(\textsf{SE}= \textsf{MTP}\text {-}\textsf{SE}\). The latter is a deterministic symmetric encryption scheme that is based on the IGE block cipher mode of operation. For each key \(k \in \{0,1\}^{\mathsf {\textsf{SE}.kl}}\) and each length \(\ell \in {{\mathbb {N}}}\) such that \(\{0,1\}^{\ell } \subseteq \mathsf {\textsf{SE}.MS}\), this scheme specifies a permutation between all plaintexts from \(\{0,1\}^\ell \) and all ciphertexts from \(\{0,1\}^\ell \). In particular, this means that \(\textsf{MTP}\text {-}\textsf{SE}\) has unique ciphertexts, meaning it is impossible to find \(c_{\textit{se}}\ne c_{\textit{se}}'\) that, under any fixed choice of key k, decrypt to the same payload \(p\). It follows that \(\textsf{bad}_4\) can never be set when \(\textsf{SE}= \textsf{MTP}\text {-}\textsf{SE}\), so we have

$$ \Pr [\textsf{bad}_4^{\textrm{G}_7}] = 0. $$

\({{{\textbf{G}}}_{8}\rightarrow {{\textbf{G}}}_{9}}\). While discussing this and subsequent transitions, we say that a ciphertext c belongs to (or appears in) a support transcript \(\textsf{tr}_{}\) if and only if \(\exists m', \textit{aux}' :(\textsf{sent}, m', c, \textit{aux}') \in \textsf{tr}_{}\).

Consider oracle \(\textsc {Recv}\) in game \(\textrm{G}_8\) (Fig. 46). Let \(\textit{st}_{\textsf{ME}, \textit{u}}^*\) contain the value of variable \(\textit{st}_{\textsf{ME}, \textit{u}}\) at the start of the ongoing call to \(\textsc {Recv}\) on inputs \((\textit{u}, c, \textit{aux})\). We start by showing that \(\textsc {Recv}\) evaluates \((\textit{st}_{\textsf{ME}, \textit{u}}, m)\) \(\leftarrow \) \(\mathsf {\textsf{ME}.Decode}(\textit{st}_{\textsf{ME}, \textit{u}}\), \(p\), \(\textit{aux})\) and does not subsequently roll back the values of \((\textit{st}_{\textsf{ME}, \textit{u}}, m)\) to \((\textit{st}_{\textsf{ME}, \textit{u}}^*, \bot )\) iff c belongs to \(\textsf{tr}_{\overline{\textit{u}}}\):

  1. (1)

    If oracle \(\textsc {Recv}\) evaluates \((\textit{st}_{\textsf{ME}, \textit{u}}, m) \leftarrow \mathsf {\textsf{ME}.Decode}(\textit{st}_{\textsf{ME}, \textit{u}}, p, \textit{aux})\) and does not restore the values of \((\textit{st}_{\textsf{ME}, \textit{u}}, m)\), then \(\textsf{auth}{\_}\textsf{key}{\_}\textsf{id}= \textsf{auth}{\_}\textsf{key}{\_}\textsf{id}'\) and \(\textsf{S}[\overline{\textit{u}}, \textsf{msg}{\_}\textsf{key}] = (p, c_{\textit{se}})\) (the latter implies \(\textsf{msg}{\_}\textsf{key}' = \textsf{msg}{\_}\textsf{key}\)). According to the construction of oracle \(\textsc {Send}\), this means that the ciphertext \(c = (\textsf{auth}{\_}\textsf{key}{\_}\textsf{id}', \textsf{msg}{\_}\textsf{key}, c_{\textit{se}})\) appears in transcript \(\textsf{tr}_{\overline{\textit{u}}}\).

  2. (2)

    Let \(c = (\textsf{auth}{\_}\textsf{key}{\_}\textsf{id}', \textsf{msg}{\_}\textsf{key}, c_{\textit{se}})\) be any \(\textsf{MTP}\text {-}\textsf{CH} \) ciphertext, and let \(\overline{\textit{u}}\in \{\mathcal {I},\mathcal {R}\}\). If c belongs to \(\textsf{tr}_{\overline{\textit{u}}}\), then by construction of oracle \(\textsc {Send}\) we know that \(\textsf{auth}{\_}\textsf{key}{\_}\textsf{id}= \textsf{auth}{\_}\textsf{key}{\_}\textsf{id}'\) and \(\textsf{S}[\overline{\textit{u}}, \textsf{msg}{\_}\textsf{key}] = (p, c_{\textit{se}})\) for the payload \(p\) such that \(k = \textsf{T}[\overline{\textit{u}}, \textsf{msg}{\_}\textsf{key}]\), and \(c_{\textit{se}}= \mathsf {\textsf{SE}.Enc}(k, p)\), and \(p= \mathsf {\textsf{SE}.Dec}(k, c_{\textit{se}})\). The latter equality is guaranteed by the decryption correctness of \(\textsf{SE}= \textsf{MTP}\text {-}\textsf{SE}\) that we used for transition \(\textrm{G}_{6}\rightarrow \textrm{G}_{7}\). The \(\textrm{RKCR}\)-security of \(\textsf{MAC}\) guarantees that once \(\textsf{S}[\overline{\textit{u}}, \textsf{msg}{\_}\textsf{key}]\) is populated, a future call to oracle \(\textsc {Send}\) cannot overwrite \(\textsf{S}[\overline{\textit{u}}, \textsf{msg}{\_}\textsf{key}]\) with a different pair of values. All of the above implies that if c belongs to \(\textsf{tr}_{\overline{\textit{u}}}\) at the beginning of a call to oracle \(\textsc {Recv}\), then this oracle will successfully verify that \(\textsf{auth}{\_}\textsf{key}{\_}\textsf{id}= \textsf{auth}{\_}\textsf{key}{\_}\textsf{id}'\) and \(\textsf{S}[\overline{\textit{u}}, \textsf{msg}{\_}\textsf{key}] = (p, c_{\textit{se}})\) for \(p\leftarrow \mathsf {\textsf{SE}.Dec}(k, c_{\textit{se}})\) (whereas \(\textsf{msg}{\_}\textsf{key}' = \textsf{msg}{\_}\textsf{key}\) follows from \(\textsf{S}[\overline{\textit{u}}, \textsf{msg}{\_}\textsf{key}]\) containing the payload \(p\)). It means that the instruction \((\textit{st}_{\textsf{ME}, \textit{u}}, m) \leftarrow \mathsf {\textsf{ME}.Decode}(\textit{st}_{\textsf{ME}, \textit{u}}, p, \textit{aux})\) will be evaluated, and the variables \((\textit{st}_{\textsf{ME}, \textit{u}}, m)\) will not be subsequently rolled back to \((\textit{st}_{\textsf{ME}, \textit{u}}^*, \bot )\).

Fig. 49
Fig. 49

Games \(\textrm{G}_9\)\(\textrm{G}_{13}\) for the proof of Theorem 2. The code highlighted in is functionally equivalent to the corresponding code in \(\textrm{G}_8\)

Game \(\textrm{G}_9\) (Fig. 49) differs from game \(\textrm{G}_8\) (Fig. 46) in the following ways:

  1. (1)

    Game \(\textrm{G}_9\) adds a payload table \(\textsf{P}\) that is updated during each call to oracle \(\textsc {Send}\). We set \(\textsf{P}[\textit{u}, c] \leftarrow p\) to indicate that the \(\textsf{MTP}\text {-}\textsf{CH} \) ciphertext c, which was sent from user \(\textit{u}\) to user \(\overline{\textit{u}}\), encrypts the payload \(p\). Observe that any pair \((\textit{u}, c)\) with \(c = (\textsf{auth}{\_}\textsf{key}{\_}\textsf{id}, \textsf{msg}{\_}\textsf{key}, c_{\textit{se}})\) corresponds to a unique payload that can be recovered as \(p\leftarrow \mathsf {\textsf{SE}.Dec}(\textsf{T}[\textit{u}, \textsf{msg}{\_}\textsf{key}], c_{\textit{se}})\). This relies on decryption correctness of \(\textsf{SE}\), which is guaranteed to hold for ciphertexts inside table \(\textsf{P}\) due to the changes that we introduced in the transition between games \(\textrm{G}_{6}\rightarrow \textrm{G}_{7}\).

  2. (2)

    Game \(\textrm{G}_9\) rewrites the code of game \(\textrm{G}_8\)’s oracle \(\textsc {Recv}\) to run \(\mathsf {\textsf{ME}.Decode}\) iff the ciphertext c belongs to the transcript \(\textsf{tr}_{\overline{\textit{u}}}\); otherwise, the \(\textsc {Recv}\) oracle does not change \(\textit{st}_{\textsf{ME}, \textit{u}}\) and simply sets \(m \leftarrow \bot \). This follows from the analysis of \(\textrm{G}_8\) that we provided above. We note that checking whether c belongs to \(\textsf{tr}_{\overline{\textit{u}}}\) is equivalent to checking \(\textsf{P}[\overline{\textit{u}}, c] \ne \bot \). For simplicity, we do the latter; and if the condition is satisfied, then we set \(p\leftarrow \textsf{P}[\overline{\textit{u}}, c]\) and run \(\mathsf {\textsf{ME}.Decode}\) with this payload as input. As discussed above, the \(\textsf{MTP}\text {-}\textsf{CH} \) ciphertext c that is issued by user \(\overline{\textit{u}}\) always encrypts a unique payload \(p\), and hence, we can rely on the fact that the table entry \(\textsf{P}[\overline{\textit{u}}, c]\) stores this unique payload value.

  3. (3)

    Game \(\textrm{G}_9\) also rewrites one condition inside oracle \(\textsc {Send}\), in a more compact but equivalent way (here we rely on the fact that values \(\textit{u}, \textsf{msg}{\_}\textsf{key}, p\) uniquely determine \(c_{\textit{se}}\)). It also adds one new conditional statement to oracle \(\textsc {Recv}\) (checking \(m^*\ne \bot \)), but it serves no purpose in \(\textrm{G}_9\).

Games \(\textrm{G}_9\) and \(\textrm{G}_8\) are functionally equivalent, so

$$ \Pr [\textrm{G}_9] = \Pr [\textrm{G}_8]. $$

\({{{\textbf{G}}}_{9}\rightarrow {{\textbf{G}}}_{10}}\). Game \(\textrm{G}_{10}\) (Fig. 49) enforces that \(m^*= \bot \) whenever oracle \(\textsc {Recv}\) is called on a ciphertext that cannot be found in the appropriate user’s transcript. Games \(\textrm{G}_{9}\) and \(\textrm{G}_{10}\) are identical until \(\textsf{bad}_5\) is set. We have

$$ \Pr [\textrm{G}_{9}] - \Pr [\textrm{G}_{10}] \le \Pr [\textsf{bad}_5^{\textrm{G}_{9}}]. $$

If \(\textsf{bad}_5\) is set in game \(\textrm{G}_9\), then the support function \(\textsf{supp}\) returned \(m^*\ne \bot \) in response to an \(\textsf{MTP}\text {-}\textsf{CH} \) ciphertext c that does not belong to the opposite user’s transcript \(\textsf{tr}_{\overline{\textit{u}}}\). The statement of Theorem 2 considers \(\textsf{supp}= \textsf{supp}\text {-}\textsf{ord}\). The latter is defined to always return \(m^*= \bot \) when its input label does not appear in \(\textsf{tr}_{\overline{\textit{u}}}\), so

$$ \Pr [\textsf{bad}_5^{\textrm{G}_{9}}] = 0. $$

We refer to this property as the integrity of support function \(\textsf{supp}\). We formalise it in Appendix A.

\({{{\textbf{G}}}_{10}\rightarrow {{\textbf{G}}}_{11}}\). Game \(\textrm{G}_{11}\) (Fig. 49) stops adding entries of the form \((\textsf{recv}, \bot , c, \textit{aux})\) to the transcripts of both users. Once this is done, it becomes pointless for adversary \(\mathcal {F}_{\textrm{INT}}\) to call its \(\textsc {Recv}\) oracle on any ciphertext that does not appear in the appropriate user’s transcript. This is because such a call will never set the \(\textsf{win}\) flag (due to the change introduced in transition \(\textrm{G}_{9}\rightarrow \textrm{G}_{10}\)) and will never affect the transcript of either user (due to the change introduced in this transition). The statement of Theorem 2 considers \(\textsf{supp}= \textsf{supp}\text {-}\textsf{ord}\). The latter is defined to ignore all transcript entries of the form \((\textsf{recv}, \bot , c, \textit{aux})\), so removing the instruction \(\textsf{tr}_{\textit{u}} \leftarrow \textsf{tr}_{\textit{u}} ~\Vert ~(\textsf{recv}, m, c, \textit{aux})\) for \(m = \bot \) will not affect the outputs of any future calls to this support function. We have

$$ \Pr [\textrm{G}_{11}] = \Pr [\textrm{G}_{10}]. $$

Earlier in this section we referred to this property as the robustness of support function \(\textsf{supp}\).

\({{{\textbf{G}}}_{11}\rightarrow {{\textbf{G}}}_{12}}\). When discussing the differences between games \(\textrm{G}_8\) and \(\textrm{G}_9\), we showed that for each pair of sender \(\textit{u}\in \{\mathcal {I}, \mathcal {R}\}\) and \(\textsf{MTP}\text {-}\textsf{CH} \) ciphertext c, the encrypted payload \(p\) is unique. It is also true that for each pair of \(\textit{u}\in \{\mathcal {I}, \mathcal {R}\}\) and payload \(p\), there is a unique \(\textsf{MTP}\text {-}\textsf{CH} \) ciphertext c that encrypts \(p\) in the direction from \(\textit{u}\) to \(\overline{\textit{u}}\). It follows that in games \(\textrm{G}_{11}\) and \(\textrm{G}_{12}\) (Fig. 49) for any fixed user \(\textit{u}\in \{\mathcal {I},\mathcal {R}\}\) there is a 1-to-1 correspondence between payloads and \(\textsf{MTP}\text {-}\textsf{CH} \) ciphertexts that could be successfully sent from \(\textit{u}\) to \(\overline{\textit{u}}\) (note that this property does not hold if \(\textsf{SE}\) does not have decryption correctness, but the code added for the transition \(\textrm{G}_{6}\rightarrow \textrm{G}_{7}\) already identifies and discards the corresponding ciphertexts). The statement of Theorem 2 considers \(\textsf{supp}= \textsf{supp}\text {-}\textsf{ord}\). Observe that for any label \(z\) sent from \(\textit{u}\) to \(\overline{\textit{u}}\), the support function \(\textsf{supp}\text {-}\textsf{ord}\) checks only its equality with every \(z^*\) such that \((\textsf{sent}, m, z^*, \textit{aux}) \in \textsf{tr}_{\textit{u}}\) or \((\textsf{recv}, m, z^*, \textit{aux}) \in \textsf{tr}_{\overline{\textit{u}}}\) across all values of \(m, \textit{aux}\). In other words, this support function only looks at the equality pattern of the labels, and it does this independently in each of the two directions between the users. The 1-to-1 correspondence between c and \(p\), with respect to any fixed user \(\textit{u}\), means we can replace the labels used in support transcripts from c to \(p\), and replace the label inputs to the support function \(\textsf{supp}\text {-}\textsf{ord}\) in the same way; this does not change the outputs of the support function. We have

$$ \Pr [\textrm{G}_{12}] = \Pr [\textrm{G}_{11}]. $$
Fig. 50
Fig. 50

Adversary \(\mathcal {F}_{\textrm{EINT}}\) against the \(\textrm{EINT}\)-security of \(\textsf{ME}, \textsf{supp}\) for the transition between games \(\textrm{G}_{12}\)\(\textrm{G}_{13}\) in the proof of Theorem 2

\({{{\textbf{G}}}_{12}\rightarrow {{\textbf{G}}}_{13}}\). Games \(\textrm{G}_{12}\) and \(\textrm{G}_{13}\) are identical until \(\textsf{bad}_6\) is set. We have

$$ \Pr [\textrm{G}_{12}] - \Pr [\textrm{G}_{13}] \le \Pr [\textsf{bad}_6^{\textrm{G}_{13}}]. $$

Games \(\textrm{G}_{12}\) and \(\textrm{G}_{13}\) (Fig. 49) can be thought of as simulating a bidirectional authenticated channel that allows the two users to exchange \(\textsf{ME}\) payloads. The adversary \(\mathcal {F}_{\textrm{INT}}\) is allowed to forward, replay, reorder and drop the payloads; but it is not allowed to forge them. This description roughly corresponds to the definition of \(\textrm{EINT}\)-security of \(\textsf{ME}\) with respect to \(\textsf{supp}\) (Fig. 16). In games \(\textrm{G}_{12}\)\(\textrm{G}_{13}\), the oracle \(\textsc {Send}\) still runs cryptographic algorithms in order to generate and return \(\textsf{MTP}\text {-}\textsf{CH} \) ciphertexts, but we will build an \(\textrm{EINT}\)-security adversary that simulates these instructions for \(\mathcal {F}_{\textrm{INT}}\). In Fig. 50, we build an adversary \(\mathcal {F}_{\textrm{EINT}}\) against the \(\textrm{EINT}\)-security of \(\textsf{ME}, \textsf{supp}\) as follows. When adversary \(\mathcal {F}_{\textrm{EINT}}\) plays in game \(\textrm{G}^{\textsf{eint}}_{\textsf{ME}, \textsf{supp}, \mathcal {F}_{\textrm{EINT}}}\), it simulates game \(\textrm{G}_{13}\) for adversary \(\mathcal {F}_{\textrm{INT}}\). Adversary \(\mathcal {F}_{\textrm{EINT}}\) wins in its own game whenever \(\mathcal {F}_{\textrm{INT}}\) sets \(\textsf{bad}_6\), so we have

$$ \Pr [\textsf{bad}_6^{\textrm{G}_{13}}] \le \textsf{Adv}^{\textsf{eint}}_{\textsf{ME}, \textsf{supp}}(\mathcal {F}_{\textrm{EINT}}). $$

Observe that \(\mathcal {F}_{\textrm{EINT}}\) takes \(\mathcal {I}\)’s and \(\mathcal {R}\)’s initial \(\textsf{ME}\) states as input, and repeatedly calls the \(\textsf{ME}\) algorithms to manually update these states (as opposed to relying on its \(\textsc {Send}\) and \(\textsc {Recv}\) oracles). This allows \(\mathcal {F}_{\textrm{EINT}}\) to correctly identify the two conditional statements inside the simulated oracle \(\textsc {SendSim}\) that require to roll back the most recent update to \(\textit{st}_{\textsf{ME}, \textit{u}}\) and to exit the oracle with \(\bot \) as output.

Adversary \(\mathcal {F}_{\textrm{INT}}\) can no longer win in game \(\textrm{G}_{13}\), because the only instruction that sets the \(\textsf{win}\) flag in games \(\textrm{G}_{0}\)\(\textrm{G}_{12}\) was removed in transition to game \(\textrm{G}_{13}\). It follows that

$$ \Pr [\textrm{G}_{13}] = 0. $$

The theorem statement follows. \(\square \)

5.6.7 Proof alternatives

In the earlier analysis of Case A, we relied on a certain property of the message encoding scheme \(\textsf{ME}\). Roughly speaking, we reasoned that the algorithm \(\mathsf {\textsf{ME}.Decode}\) should not be able to successfully decode random-looking strings, meaning it should require that decodable payloads are structured in a certain way. We now briefly outline a proof strategy that does not rely on such a property of \(\textsf{ME}\).

In Case A adversary \(\mathcal {F}_{\textrm{INT}}\) calls its oracle \(\textsc {Recv}(\textit{u}, c, \textit{aux})\) on \(c = (\textsf{auth}{\_}\textsf{key}{\_}\textsf{id}', \textsf{msg}{\_}\textsf{key}, c_{\textit{se}})\) with a \(\textsf{msg}{\_}\textsf{key}\) value that was never previously returned by oracle \(\textsc {Send}\) as a part of a ciphertext produced by user \(\overline{\textit{u}}\). Let us modify our initial goal for Case A as follows: we want to show that evaluating \(k \leftarrow \textsf{KDF}.\textsf{Ev}(\textit{kk}_{\overline{\textit{u}}}, \textsf{msg}{\_}\textsf{key})\), \(p\leftarrow \mathsf {\textsf{SE}.Dec}(k, c_{\textit{se}})\) and \(\textsf{msg}{\_}\textsf{key}' \leftarrow \textsf{MAC}.\textsf{Ev}(\textit{mk}_{\overline{\textit{u}}}, p)\) is very unlikely to result in \(\textsf{msg}{\_}\textsf{key}' = \textsf{msg}{\_}\textsf{key}\). In fact, it is sufficient to focus on the last instruction here: we require that it is hard to forge any input–output pair \((p, \textsf{msg}{\_}\textsf{key})\) such that \(\textsf{msg}{\_}\textsf{key}= \textsf{MAC}.\textsf{Ev}(\textit{mk}_{\overline{\textit{u}}}, p)\). This property is guaranteed if \(\textsf{MAC}\) is related-key PRF-secure.

Theorem 2 is currently stated for a generic function family \(\textsf{MAC}\), but it could be narrowed down to use \(\textsf{MAC}= \textsf{MTP}\text {-}\textsf{MAC}\) where \(\textsf{MTP}\text {-}\textsf{MAC}.\textsf{Ev}(\textit{mk}_\textit{u}, p) = \textsf{SHA}-\textsf{256}(\textit{mk}_\textit{u}~\Vert ~p){[64:192]}\). Crucially, the algorithm \(\textsf{MTP}\text {-}\textsf{MAC}.\textsf{Ev}\) is defined to drop half of the output bits of \(\textsf{SHA}-\textsf{256}\); this prevents length-extension attacks. We could model \(\textsf{MTP}\text {-}\textsf{MAC}\) as the Augmented MAC (AMAC) and use the results from [10] to show that it is related-key PRF-secure. Technically, this would require proving three claims as follows:

  1. (1)

    Output of the first compression function within \(\textsf{SHA}-\textsf{256}(\textit{mk}_\textit{u}~\Vert ~p){[64:192]}\) looks uniformly random when used with related keys; we already formalise and analyse this property in Sect. 5.2, phrased as the \(\textrm{HRKPRF}\)-security of \(\textsf{SHACAL}-\textsf{2}\) with respect to \(\phi _{\textsf{MAC}}\).

  2. (2)

    The \(\textsf{SHA}-\textsf{256}\) compression function \(h _{256}\) is \(\textrm{OTPRF}\)-secure.

  3. (3)

    The \(\textsf{SHA}-\textsf{256}\) compression function is (roughly) PRF-secure even in the presence of some leakage on its key, i.e. an attacker receives k[64 : 192] when trying to break the PRF security of \(h _{256}(k, \cdot )\); we do not formalise or analyse this property in our work.

Here (1) and (2) could be chained together to show that \(\textsf{MTP}\text {-}\textsf{MAC}\) is a secure PRF even for variable-length inputs; then, (3) would suffice to show that \(\textsf{MTP}\text {-}\textsf{MAC}\) is resistant to length-extension attacks.

Adopting the above proof strategy would have allowed us to omit the following two steps from the current security reduction. The \(\textrm{UNPRED}\)-security of \(\textsf{SE}, \textsf{ME}\) would get directly replaced with a new related-key PRF-security assumption for \(\textsf{MAC}= \textsf{MTP}\text {-}\textsf{MAC}\), following the results for AMAC from [10]. The \(\textrm{RKPRF}\)-security of \(\textsf{KDF}\) (with respect to \(\phi _{\textsf{KDF}}\)) would no longer be needed, because currently its only use is to transform the security game prior to appealing to the \(\textrm{UNPRED}\)-security of \(\textsf{SE}, \textsf{ME}\).

5.7 Instantiation and interpretation

We are now ready to combine the theorems from the previous two sections with the notions defined in Sects. 5.1 and 5.3 and the proofs in Appendix E. This is meant to allow interpretation of our main results: qualitatively (what security assumptions are made) and quantitatively (what security level is achieved). Note that in both of the following corollaries, the adversary is limited to making \(2^{96}\) queries. This is due to the wrapping of counters in \(\textsf{MTP}\text {-}\textsf{ME}\), since beyond this limit the advantage in breaking \(\textrm{UPREF}\)-security and \(\textrm{EINT}\)-security of \(\textsf{MTP}\text {-}\textsf{ME}\) becomes 1.

Corollary 1

Let \(\textsf {session}\_\textsf {id}\in \{0,1\}^{64}\), \(\textsf {pb} \in {{\mathbb {N}}}\) and \(\textsf{bl}= 128\). Let \(\textsf{ME}= \textsf{MTP}\text {-}\textsf{ME}[\textsf {session}\_\textsf {id}, \textsf {pb}, \textsf{bl}]\), \(\textsf{MTP}\text {-}\textsf{HASH}\), \(\textsf{MTP}\text {-}\textsf{MAC}\), \(\textsf{MTP}\text {-}\textsf{KDF}\), \(\phi _{\textsf{MAC}}\), \(\phi _{\textsf{KDF}}\), \(\textsf{MTP}\text {-}\textsf{SE}\) be the primitives defined in Sect. 4.4. Let \(\textsf{CH}= \textsf{MTP}\text {-}\textsf{CH} [\textsf{ME}, \textsf{MTP}\text {-}\textsf{HASH}, \textsf{MTP}\text {-}\textsf{MAC}, \textsf{MTP}\text {-}\textsf{KDF}, \phi _{\textsf{MAC}}, \phi _{\textsf{KDF}}, \textsf{MTP}\text {-}\textsf{SE}]\). Let \(\phi _{\textsf{SHACAL}-\textsf{2}}\) be the related-key-deriving function defined in Fig. 29. Let \(h _{256}\) be the \(\textsf{SHA}-\textsf{256}\) compression function, and let \(\textsf{H} \) be the corresponding function family with \(\textsf{H}.\textsf{Ev} = h _{256}\), \(\textsf{H}.\textsf{kl} = \textsf{H}.\textsf{ol} = 256\) and \(\textsf{H}.\textsf{IN} = \{0,1\}^{512}\). Let \(\ell \in {{\mathbb {N}}}\). Let \(\mathcal {D}_{\textrm{IND}}\) be any adversary against the \(\textrm{IND}\)-security of \(\textsf{CH}\), making \(q_{\textsc {Ch}}\le 2^{96}\) queries to its \(\textsc {Ch}\) oracle, each query made for a message of length at most \(\ell \le 2^{27}\) bits.Footnote 29 Then, we can build adversaries \(\mathcal {D}_{\textrm{OTPRF}}^{\textsf{shacal}}\), \(\mathcal {D}_{\textrm{LRKPRF}}\), \(\mathcal {D}_{\textrm{HRKPRF}}\), \(\mathcal {D}_{\textrm{OTPRF}}^{\textsf{compr}}\), \(\mathcal {D}_{\mathrm {OTIND\$}}\) such that

$$\begin{aligned} \textsf{Adv}^{\textsf{ind}}_{\textsf{CH}}(\mathcal {D}_{\textrm{IND}})&\le 4 \cdot \Big (\textsf{Adv}^{\textsf{otprf}}_{\textsf{SHACAL}-\textsf{1}}(\mathcal {D}_{\textrm{OTPRF}}^{\textsf{shacal}}) \\&\quad + \textsf{Adv}^{\textsf{lrkprf}}_{\textsf{SHACAL}-\textsf{2}, \phi _{\textsf{KDF}}, \phi _{\textsf{SHACAL}-\textsf{2}}}(\mathcal {D}_{\textrm{LRKPRF}}) \\&\quad + \textsf{Adv}^{\textsf{hrkprf}}_{\textsf{SHACAL}-\textsf{2},\phi _{\textsf{MAC}}}(\mathcal {D}_{\textrm{HRKPRF}}) \\&\quad + \left\lfloor \frac{\ell + 256}{512} + \frac{\textsf {pb} + 1}{4}\right\rfloor \cdot \textsf{Adv}^{\textsf{otprf}}_{\textsf{H}}(\mathcal {D}_{\textrm{OTPRF}}^{\textsf{compr}})\Big ) \\&\quad + \; \frac{q_{\textsc {Ch}}\cdot (q_{\textsc {Ch}}- 1)}{2^{128}} \\&\quad + \; 2 \cdot \textsf{Adv}^{\mathsf {otind\$}}_{\textsf{CBC}[\textsf{AES}-\textsf{256}]}(\mathcal {D}_{\mathrm {OTIND\$}}). \end{aligned}$$

Corollary 1 follows from Theorem 1 together with Proposition 5, Proposition 6, Proposition 7 with Lemma 1 and Proposition 8. The two terms in Theorem 1 related to \(\textsf{ME}\) are zero for \(\textsf{ME}= \textsf{MTP}\text {-}\textsf{ME}\) when an adversary is restricted to making \(q_{\textsc {Ch}}\le 2^{96}\) queries. Qualitatively, Corollary 1 shows that the confidentiality of the MTProto-based channel depends on whether \(\textsf{SHACAL}-\textsf{1}\) and \(\textsf{SHACAL}-\textsf{2}\) can be considered as pseudorandom functions in a variety of modes: with keys used only once, related keys, partially chosen-keys when evaluated on fixed inputs and when the key and input switch positions. Especially the related-key assumptions (\(\textrm{LRKPRF}\) and \(\textrm{HRKPRF}\) given in Sect. 5.2) are highly unusual; in Appendix F, we show that both assumptions hold in the ideal cipher model, but both of them require further study in the standard model. Quantitatively, a limiting term in the advantage, which implies security only if \(q_{\textsc {Ch}}< 2^{64}\), is a result of the birthday bound on the MAC output, though we note that we do not have a corresponding attack in this setting and thus the bound may not be tight.

Corollary 2

Let \(\textsf {session}\_\textsf {id}\in \{0,1\}^{64}\), \(\textsf {pb} \in {{\mathbb {N}}}\) and \(\textsf{bl}= 128\). Let \(\textsf{ME}= \textsf{MTP}\text {-}\textsf{ME}[\textsf {session}\_\textsf {id}, \textsf {pb}, \textsf{bl}]\), \(\textsf{MTP}\text {-}\textsf{HASH}\), \(\textsf{MTP}\text {-}\textsf{MAC}\), \(\textsf{MTP}\text {-}\textsf{KDF}\), \(\phi _{\textsf{MAC}}\), \(\phi _{\textsf{KDF}}\), \(\textsf{MTP}\text {-}\textsf{SE}\) be the primitives defined in Sect. 4.4. Let \(\textsf{CH}= \textsf{MTP}\text {-}\textsf{CH} [\textsf{ME}, \textsf{MTP}\text {-}\textsf{HASH}, \textsf{MTP}\text {-}\textsf{MAC}, \textsf{MTP}\text {-}\textsf{KDF}, \phi _{\textsf{MAC}}, \phi _{\textsf{KDF}}, \textsf{MTP}\text {-}\textsf{SE}]\). Let \(\phi _{\textsf{SHACAL}-\textsf{2}}\) be the related-key-deriving function defined in Fig. 29. Let \(\textsf{SHA}-\textsf{256}'\) be \(\textsf{SHA}-\textsf{256}\) with its output truncated to the middle 128 bits. Let \(\textsf{supp}=\textsf{supp}\text {-}\textsf{ord}\) be the support function as defined in Fig. 32. Let \(\mathcal {F}_{\textrm{INT}}\) be any adversary against the \(\textrm{INT}\)-security of \(\textsf{CH}\) with respect to \(\textsf{supp}\), making \(q_{\textsc {Send}}\le 2^{96}\) queries to its \(\textsc {Send}\) oracle. Then, we can build adversaries \(\mathcal {D}_{\textrm{OTPRF}}\), \(\mathcal {D}_{\textrm{LRKPRF}}\), \(\mathcal {F}_{\textrm{CR}}\) such that

$$\begin{aligned} \textsf{Adv}^{\textsf{int}}_{\textsf{CH}, \textsf{supp}}(\mathcal {F}_{\textrm{INT}})&\le 2 \cdot \Big (\textsf{Adv}^{\textsf{otprf}}_{\textsf{SHACAL}-\textsf{1}}(\mathcal {D}_{\textrm{OTPRF}}) \\&\quad + \textsf{Adv}^{\textsf{lrkprf}}_{\textsf{SHACAL}-\textsf{2}, \phi _{\textsf{KDF}}, \phi _{\textsf{SHACAL}-\textsf{2}}}(\mathcal {D}_{\textrm{LRKPRF}})\Big ) \\&\quad + \; \frac{q_{\textsc {Send}}}{2^{64}} + \textsf{Adv}^{\textsf{cr}}_{\textsf{SHA}-\textsf{256}'}(\mathcal {F}_{\textrm{CR}}). \end{aligned}$$

Corollary 2 follows from Theorem 2 together with Proposition 5, Proposition 6 and Proposition 11. The term \(\textsf{Adv}^{\textsf{eint}}_{\textsf{MTP}\text {-}\textsf{ME}, \textsf{supp}\text {-}\textsf{ord}}(\mathcal {F}_{\textrm{EINT}})\) from Theorem 2 resolves to 0 for adversaries making \(q_{\textsc {Send}}\le 2^{96}\) queries according to Proposition 9. Qualitatively, Corollary 2 shows that also the integrity of the MTProto-based channel depends on \(\textsf{SHACAL}-\textsf{1}\) and \(\textsf{SHACAL}-\textsf{2}\) behaving as PRFs. Due to the way \(\textsf{MTP}\text {-}\textsf{MAC}\) is constructed, the result also depends on the collision resistance of truncated output \(\textsf{SHA}-\textsf{256}\) (as discussed in Sect. 5.1). Quantitatively, the advantage is again bounded by \({q_{\textsc {Send}}} < {2^{64}}\). This bound follows from the fact that the first block of payload contains a 64-bit constant \(\textsf {session}\_\textsf {id}\), which has to match upon decoding. If the MTProto message encoding scheme consistently checked more fields during decoding (especially in the first block), the bound could be improved.

6 Timing side-channel attack

Formal models and proofs such as the ones in the previous sections cannot by their nature capture all possible security guarantees of a real system, which we illustrate in this section. Going beyond the model, we present a timing side-channel attack against implementations of MTProto. The attack arises from MTProto’s reliance on an Encrypt & MAC construction, the malleability of IGE mode, and specific weaknesses in implementations. The attack proceeds in the spirit of [5]: move a target ciphertext block to a position where the underlying plaintext will be interpreted as a length field and use the resulting behaviour to learn some information. The attack is complicated by Telegram using IGE mode instead of CBC mode analysed in [5]. We begin by describing a generic way to overcome this obstacle in Sect. 6.1. We describe the side channels found in the implementations of several Telegram clients in Sect. 6.2 and experimentally demonstrate the existence of a timing side channel in the desktop client in Sect. 6.3.

6.1 Manipulating IGE

Recall that in IGE mode, we have \(c_i = E_K(m_i \oplus c_{i-1}) \oplus m_{i-1}\) for \(i = 1, 2, \dots , t\) (see Sect. 2). Suppose we intercept an IGE ciphertext \(c\) consisting of t blocks (for any block cipher E): \(c_1~|~c_2~|~\dots ~|~c_t\) where \(|\) denotes a block boundary. Further, suppose we have a side channel that enables us to learn some bits of the second plaintext block during decryption.Footnote 30 Fix a target block number \(i\) for which we are interested in learning a portion of \(m_i\) that is encrypted in \(c_i\). Additionally, assume we know the plaintext blocks \(m_1\) and \(m_{i-1}\).

We construct a ciphertext \(c_{1}~|~c^{\star }\) where \(c^{\star } :=c_i \oplus m_{i-1} \oplus m_1\). This is decrypted in IGE mode as follows:

$$\begin{aligned} m_1&= E_{K}^{-1}(c_1 \oplus IV _{m}) \oplus IV _c\\ m^{\star }&= E_{K}^{-1}(c^{\star } \oplus m_1) \oplus c_1 = E_{K}^{-1}(c_i \oplus m_{i-1}) \oplus c_1 \\&= m_i \oplus c_{i-1} \oplus c_1 \end{aligned}$$

Since we know \(c_{1}\) and \(c_{i-1}\), we can recover some bits of \(m_i\) if we can obtain the corresponding bits of the second plaintext block \(m^{\star }\) through the side-channel leak.

To motivate our known plaintext assumption, consider a message where \(m_{i-1} =\) “Today’s password” and \(m_{i} = \) “is SECRET”. Here \(m_{i-1}\) is known, while learning bytes of \(m_{i}\) is valuable. On another hand, the requirement of knowing \(m_{1}\) may not be easy to fulfil in MTProto. The first plaintext block of an MTProto payload always contains \(\textsf {server}\_\textsf {salt} ~\Vert ~\textsf {session}\_\textsf {id} \), both of which are random values. It is unclear whether they were intended to be secret, but in effect they are, limiting the applicability of this attack. Section 7 gives an attack to recover these values. Note that these values are the same for all ciphertexts within a single session, so if they were recovered, then we could carry out the attack on each of the ciphertexts in turn. This allows the basic attack above to be iterated when the target \(m_{i}\) is fixed across all the ciphertexts, e.g. in order to amplify the total information learned about \(m_i\) when a single ciphertext allows to infer only a partial or noisy information about it (cf. [5]).

6.2 Leaky length field

The preceding attack assumes we have a side channel that enables us to learn a part of the second plaintext block during decryption. We now show how such side channels arise in implementations.

The msg_length field occupies the last four bytes of the second block of every MTProto cloud message plaintext (see Sect. 4.1). After decryption, the field is checked for validity in Telegram clients. Crucially, in several implementations this check is performed before the MAC check, i.e. before msg_key is recomputed from the decrypted plaintext. If either of those checks fails, the client closes the connection without outputting a specific error message. However, if an implementation is not constant time, an attacker who submits modified ciphertexts of the form described above may be able to distinguish between an error arising from validity checking of msg_length and a MAC error, and thus learn something about the bits of plaintext in the position of the msg_length field.

Since different Telegram clients implement different checks on the msg_length field, we now proceed to a case-by-case analysis, showing relevant code excerpts in each case.

6.2.1 Android

The field msg_length is referred to as messageLength here. The check is performed in decryptServerResponse of Datacenter.cpp [68], which compares messageLength with another length field (see code below). If the messageLength check fails, the MAC check is still performed. The timing difference thus consists only of two conditional jumps, which would be small in practice. The length field is taken from the first four bytes of the transport protocol format and is not checked against the actual packet size, so an attacker can substitute arbitrary values. Using multiple queries with different length values could thus enable extraction of up to 32 bits of plaintext from the messageLength field.

figure n

6.2.2 Desktop

The method handleReceived of session_private.cpp [71] performs the length check, comparing the messageLength field with a fixed value of kMaxMessageLength \(=2^{24}\). When this check fails, the connection is closed and no MAC check is performed, providing a potentially large timing difference. Because of the fixed value \(2^{24}\), this check would leak the 8 most significant bits of the target block \(m_i\) with probability \(2^{-8}\), i.e. the eight most significant bits of the 32-bit length field, allowing those bits to be recovered after about \(2^8\) attempts on average.Footnote 31

figure o

6.2.3 iOS

The field msg_length is referred to as messageDataLength here. The check is performed in _decryptIncomingTransportData of MTProto.m [72], which compares messageDataLength with the length of the decrypted data first in a padding length check and then directly, see code below. If either check fails, it hashes the complete decrypted payload. A timing side channel arises because sometimes this countermeasure hashes fewer bytes than a genuine MAC check (the latter also hashes 32 bytes of auth_key, here effectiveAuthKey.authKey; hence one more 512-bit block will be hashed unless the length of the decrypted payload in bits modulo 512 is 184 or less,Footnote 32 this condition being due to padding). If an attacker can change the value of decryptedData.length directly or by attaching additional ciphertext blocks, this could leak up to 32 bits of plaintext as in the Android client.

figure p

6.2.4 Discussion

Note that all three of the above implementations were in violation of Telegram’s own security guidelines [64] which state: “If an error is encountered before this check could be performed, the client must perform the msg_key check anyway before returning any result. Note that the response to any error encountered before the msg_key check must be the same as the response to a failed msg_key check.” In contrast, TDLib [66], the cross-platform library for building Telegram clients, avoids timing leaks by running the MAC check first.

Remark 1

Recall that in Sect. 4.4, we define a simplified message encoding scheme which uses a constant in place of session_id and server_salt . This change would make the above attack more practical. However, the attack is enabled by a misplaced msg_key check and the mitigation offered by those values being secret in the implementations is accidental. Put differently, the attacks described in this section do not justify their secrecy; our proofs of security do not rely on them being secret.

Fig. 51
Fig. 51

Processing time of SessionPrivate::handleReceived in microseconds

6.3 Practical experiments

We ran experiments to verify whether the side channel present in the desktop client code is exploitable. We measured the time difference between processing a message with a wrong msg_length and processing a message with a correct msg_length but a wrong MAC. This was done using the Linux desktop client, modified to process messages generated on the client side without engaging the network. The code can be found in Appendix G.1. We collected data for \(10^8\) trials for each case under ideal conditions, i.e. with hyper-threading, Turbo Boost etc. disabled. After removing outliers, the difference in means was about 3 microseconds, see Fig. 51. This should be sufficiently large for a remote attacker to detect, even with network and other noise sources (cf. [6], where sub-microsecond timing differences were successfully resolved over a LAN).

7 Attacking the key exchange

Recall that our attack in Sect. 6 relies on knowledge of \(m_{1}\) which in MTProto contains a 64-bit salt and a 64-bit session ID. In Sect. 7.1, we present a strategy for recovering the 64-bit salt. We then use it in a simple guess and confirm approach to recover the session ID in Sect. 7.2.

We stress, however, that the attack in Sect. 7.1 only applies in a short period after a key exchange between a client and a server.Footnote 33 Furthermore, the attack critically relies on observing small timing differences which is unrealistic in practice, especially over a wide area network. That is, our attack relies on a timing side channel when Telegram’s servers decrypt RSA ciphertexts and verify their integrity. While—in response to our disclosure—the Telegram developers confirmed the presence of non-constant code in that part of their implementation and hence confirmed our attack, they did not share source code or other details with us. That is, since Telegram does not publish source code for its servers in contrast to its clients the only option to verify the precise server behaviour is to test it. This would entail sending millions if not billions of requests to Telegram’s servers, from a host that is geographically and topologically close to one of Telegram’s data centres, observing the response time. Such an experiment would have been at the edge of our capabilities but is clearly feasible for a dedicated, well-resourced attacker.

In Sect. 7.3, we then discuss how the attack in Sect. 7.1 enables to break server authentication and thus enables an attacker-in-the-middle (MitM) attack on the Diffie-Hellman key exchange.

7.1 Recovering the salt

At a high level, our strategy exploits the fact that during the initial key exchange, Telegram integrity-protects RSA ciphertexts by including a hash of the underlying message contents in the encrypted payload except for the random padding which necessitates parsing the data which in turn establishes the potential for a timing side-channel.Footnote 34 In what follows, we assume the presence of such a side channel and show how it enables the recovery of the encrypted message, solving noisy linear equations via lattice reduction. We refer the reader to [2, 45] for an introduction to the application of lattice reduction in side-channel attacks and the state of the art, respectively.

In Fig. 52, we show Telegram’s instantiation of the Diffie-Hellman key exchange [73] at the level of detail required for our attack, omitting TL schema encoding. In Fig. 52, we let \(n :=\textsf {nonce} \), \(s :=\textsf {server}\_\textsf {nonce} \), \(n' :=\textsf {new}\_\textsf {nonce} \) be nonces; \(\mathcal {S}\) be the set of public server fingerprints, \(F \in \mathcal {S}\) be the fingerprint of the key selected by the client, \(t_{s} :=\textsf {server}\_\textsf {time} \) be a timestamp for the server; let \(\mathcal {F}(\cdot , \cdot )\) be some function used to derive keysFootnote 35; let \(p_{r}, p_{s}, p_{c}\) be random padding of appropriate length; and \(ak :=\textsf {auth}\_\textsf {key} \) be the final key. The value \(N = p \cdot q\) is a product of two 32-bit primes pq selected by the server and sent to the client as a rate-limiting challenge; the client can only proceed with the key exchange after factoring N. The initial salt used by Telegram is then computed as \(\textsf {server}\_\textsf {salt} :=n'[0:64] \oplus s[0:64]\). Since \(s\) is sent in the clear during the key exchange protocol, recovering the salt is equivalent to recovering \(n'[0:64]\). We let \(N',e\) denote the public RSA key (modulus and exponent) used to perform textbook RSA encryption by the client in the key exchange, and we let \(d\) denote the private RSA exponent used by the server to perform RSA decryption.Footnote 36 We assume \(N'\) has exactly 2048 bits which holds for the values used by Telegram.

Fig. 52
Fig. 52

Illustration of the MTProto 2.0 key exchange, where \(\textsf{IGE}= \mathsf {IGE[AES}-\mathsf {256]}\) and \(\textsf{RSA}\) is textbook RSA encryption. For clarity, we do not show the plaintext encoding and merely list the individual components of each plaintext input

Further, we have

$$\begin{aligned} h_{n'} :=\textsf{SHA}-\textsf{1}[{ n' \Vert \texttt{0x0}i \Vert \textsf{SHA}-\textsf{1}[ ak ][0:64]}][32:160] \end{aligned}$$

in Fig. 52 where \(i = 1\), 2 or 3 depending on whether the key exchange terminated successfullyFootnote 37 and \(h_{r}, h_{s}, h_{c}\) are \(\textsf{SHA}-\textsf{1}\) hashes over the corresponding payloads except for the padding \(p_{r}, p_{s}, p_{c}\). In particular, we have

$$ h_{r} :=\textsf{SHA}-\textsf{1}[{N, p, q, n, s, n'}]. $$

The critical observation in this section is that while \(n\), \(s\) and \(n'\) have fixed lengths of 128, 128 and 256 bits, respectively, the same is not true for \(N\), \(p\) and \(q\). This implies that the content to be fed to \(\textsf{SHA}-\textsf{1}\) after RSA decryption and during verification must first be parsed by the server. This opens up the possibility of a timing side channel. In particular, at a byte level \(\textsf{SHA}-\textsf{1}\) is called on

$$\begin{aligned} hd\ \Vert \ \mathcal {L}(N) \Vert N \Vert \mathcal {P}(N)\ \Vert \ \mathcal {L}(p) \Vert p \Vert \mathcal {P}(p)\ \Vert \ \mathcal {L}(q) \Vert q \Vert \mathcal {P}(q) \ \Vert \ n \Vert s \Vert n' \end{aligned}$$

where \(\mathcal {L}(x)\) encodes the length of \(x\) in one byteFootnote 38\(x\) is stored in big endian byte order and \(\mathcal {P}(x)\) is up to three zero bytes so that length of \(\mathcal {L}(x)\Vert x\Vert \mathcal {P}(x)\) is divisible by 4; \(hd=\texttt{0xec5ac983}\).

We verified the following behaviour of the Telegram server, where “checking” means the key exchange aborts if the payload deviates from the expectation.

  • The header \(hd = \texttt{0xec5ac983}\) is checked;

  • the server checks that \(1 \le \mathcal {L}(N) \le 16\) and \(\mathcal {L}(p)\), \(\mathcal {L}(q) = 4\) (different valid encodings, e.g. by prefixing zeroes, of valid values are not accepted);

  • the value of \(N\) is not checked, \(p,q\) are checked against the value of \(N\) stored on the server and the server checks that \(p<q\);

  • the contents of \(\mathcal {P}(\cdot )\) are not checked;

  • both \(n,s\) are checked.

While we do not know in what order the Telegram server performs these checks, we recall that the payload must be parsed before being integrity checked and that the number of bytes being fed to \(\textsf{SHA}-\textsf{1}\) depends on this parsing. This is because the random padding must be removed from the payload before calling \(\textsf{SHA}-\textsf{1}\).

Recall that the Telegram developers acknowledged the attack presented here but did not provide further details on their implementation. Therefore, below we will assume that the Telegram server code follows a similar pattern to Telegram’s flagship \(\textsf{TDLib}\) library, which is used e.g. to implement the Telegram Bot API [58]. While \(\textsf{TDLib}\) does not implement RSA decryption, it does implement message parsing during the handshake. In particular, the library returns early when the header does not match its expected value. In our case the header is \(\texttt{0xec5ac983}\) but we stress that this behaviour does not seem to be problematic in \(\textsf{TDLib}\) and we do not know if the Telegram servers follow the same pattern also for RSA decryption. We will discuss other leakage patterns below, but for now we will assume the Telegram servers return early whenever there is a header mismatch, skipping the \(\textsf{SHA}-\textsf{1}\) call in this case. This produces a timing side channel.

Thus, we consider a textbook RSA ciphertext \(c = m^{e} \bmod N'\) with

$$\begin{aligned} m = h_{r} \Vert hd\Vert \mathcal {L}(N) \Vert N \Vert \mathcal {P}(N)\Vert \mathcal {L}(p) \Vert p \Vert \mathcal {P}(p) \Vert \mathcal {L}(q) \Vert q \Vert \mathcal {P}(q) \Vert n \Vert s \Vert n' \Vert p_{r} \end{aligned}$$

of length \(255\) bytes. First, observe that an attacker knows all contents of the payload (including their encodings) except for \(h_{r}\), \(n'\) and \(p_{r}\) and we can write:

$$\begin{aligned} x&= 2^{\ell (p_{r})} \cdot n' + p_{r} < 2^{256 + \ell (p_{r})}\\ m&= (2^{1880} \cdot h_{r} + 2^{256 + \ell (p_{r})} \cdot \gamma + x) \end{aligned}$$

where \(\gamma \) is a known constant derived from \(n,s,p,q,N\) and where \(\ell (p_{r})\) is the known length of \(p_{r}\). This relies on knowing that \(\left| n'\right| =256\) and \(\left| m\right| - \left| h_{r}\right| = 1880\).

Under our assumption on header checking, we can detect whether the bits in positions \(1848 = 8\cdot 255-160-32\) to \(1879 = 8\cdot 255 -160-1\) (big endian, \(\textsf{SHA}-\textsf{1}\) returns 160 bits) of \(m' :={(c')}^{d}\) match \(\texttt{0xec5ac983}\) for any \(c'\) we submit to the Telegram servers. Thus, inspired by [19], we submit \(s_{i}^{e} \cdot c\), for several chosen \(s_{i}\) to the server and receive back an answer whether the bits \({1848}\) to \({1879}\) of \(s_{i} \cdot m\) match the expected header. If the \(s_{i}\) are chosen sufficiently randomly, this event will have probability \(\approx 2^{-32}\). Writing \(\zeta = \texttt{0xec5ac983}\), we consider

$$\begin{aligned} e_{i}&= \left( \left( {s_{i} \cdot m \bmod N'} \right) - \zeta \cdot 2^{1848}\right) \bmod 2^{1880}\\&= \left( \left( s_{i} \cdot \left( {2^{1880} \cdot h_{r} + 2^{256 + \ell (p_{r})} \cdot \gamma + x}\right) \bmod N'\right) - \zeta \cdot 2^{1848}\right) \bmod 2^{1880}\\&= \left( \left( \left( s_{i} \cdot 2^{1880} \cdot h_{r} + s_{i} \cdot 2^{256 + \ell (p_{r})} \cdot \gamma + s_{i} \cdot x\right) \bmod N' \right) - \zeta \cdot 2^{1848}\right) \\&\bmod 2^{1880}. \end{aligned}$$

That is, we pick random \(s_{i}\) (we will discuss how to pick those below) and submit \(s_{i}^{e} \cdot c\) to the Telegram servers. Using the timing side channel, we then detect when the bits in the header position match \(\zeta \). When this happens, we store \(s_{i}\). Overall, we find \(\mu \) such \(s_{i}\) (we discuss below how to pick \(\mu \)) and suppose the event happens for some set of \(s_i\), with \(i=0,\ldots ,\mu -1\).

Recovering \(h_r\). Note that \(e_{i} < 2^{1880-32}\) by construction and \(x < 2^{256 + \ell (p_{r})} \ll 2^{1848}\). Thus, picking sufficiently small \(s_{i}\) an attacker can make \(e'_{i} :=(e_{i} - s_{i} \cdot x) \bmod 2^{1880} < 2^{1848}\), i.e.

$$\begin{aligned} e'_{i}&= \left( \left( \left( s_{i} \cdot 2^{1880} \cdot h_{r} + s_{i} \cdot 2^{256 + \ell (p_{r})} \cdot \gamma \right) \bmod N' \right) - \zeta \cdot 2^{1848}\right) \\&\bmod 2^{1880} < 2^{1848}. \end{aligned}$$

We rewrite \(e'_{i}\) as

$$\begin{aligned} e'_{i}&= \left( s_{i} \cdot 2^{1880} \cdot h_{r} + s_{i} \cdot 2^{256 + \ell (p_{r})} \cdot \gamma - \zeta \cdot 2^{1848} - \sigma _{i}\cdot 2^{1880}\right) \bmod N' \end{aligned}$$

for \(\sigma _{i} < 2^{160}\) and use lattice reduction to recover \(h_{r}\). Writing

$$ t_{i} = \left( s_{i} \cdot 2^{256 + \ell (p_{r})} \cdot \gamma - \zeta \cdot 2^{1848}\right) \bmod N', $$

we consider the lattice spanned by the rows of \(L_{1}\) with

$$\begin{aligned} L_{1} :=\begin{pmatrix} 2^{1688} & 0 & 0 & 0 & 2^{1880} \cdot s_{0} & \cdots & 2^{1880} \cdot s_{\mu -1} & 0\\ 0 & 2^{1688}& 0 & 0 & 2^{1880} & \cdots & 0 & 0\\ 0 & 0 & \ddots & 0 & 0 & \ddots & 0 & 0\\ 0 & 0 & 0 & 2^{1688}& 0 & \cdots & 2^{1880} & 0\\ 0 & 0 & 0 & 0 & N' & \cdots & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & \ddots & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & \cdots & N' & 0\\ 0 & 0 & 0 & 0 & t_{0} & \cdots & t_{\mu -1} & 2^{1848}\\ \end{pmatrix}. \end{aligned}$$

Multiplying \(L_{1}\) from the left by

$$ (h_{r},\ -\sigma _{0},\ \ldots ,\ -\sigma _{\mu -1},\ *, \ldots , *, 1) $$

where \(*\) stands for modular reduction by \(N'\), shows that this lattice contains a vector

$$\begin{aligned} (2^{1688}\cdot h_{r},\ -2^{1688}\, \sigma _{0},\ \ldots ,\ -2^{1688}\, \sigma _{\mu -1},\ e'_{0}, \ldots ,\ e'_{\mu -1},\ 2^{1848}) \end{aligned}$$
(1)

where all entries are bounded by \(2^{1848} = 2^{1688 + 160}\). Thus that vector has Euclidean norm \(\le \sqrt{2\,\mu +2} \cdot 2^{1848}\).Footnote 39 On the other hand, the Gaussian heuristic predicts the shortest vector in the lattice to have norm

$$\begin{aligned} \approx \sqrt{\frac{2\,\mu +2}{2\pi \, e}} \cdot {\left( {2^{1688 \cdot (\mu +1)} \cdot {(N')}^{\mu } \cdot 2^{1848}}\right) }^{1/(2\,\mu +2)}. \end{aligned}$$
(2)

Finding a shortest vector in the lattice spanned by the rows of \(L_{1}\) is expected to recover our target vector and thus \(h_{r}\) when the norm of expression (1) is smaller than the expression (2) which is satisfied for \(\mu =6\).

We experimentally verified that LLL on a \((2\cdot 6 + 2)\)-dimensional lattice constructed as \(L_{1}\) indeed succeeds (cf. Appendix G.2). Thus, under our assumptions, recovering \(h_{r}\) requires about \(6 \cdot 2^{32}\) queries to Telegram’s servers and a trivial amount of computation.

Recovering \(n'\). Once we have recovered \(h_{r}\), we can target \(n'\). Writing \(\gamma ' = 2^{1880-256-\ell (p_{r})} \cdot h_{r} + \gamma \), we obtain

$$\begin{aligned} d_{i} =&\left( \left( {s'_{i} \cdot m \bmod N'} \right) - \zeta \cdot 2^{1848}\right) \bmod 2^{1880}\\ =&\left( \left( s'_{i} \cdot \left( {2^{256 + \ell (p_{r})} \cdot \gamma ' + x}\right) \bmod N'\right) - \zeta \cdot 2^{1848}\right) \bmod 2^{1880}\\ =&\left( \left( \left( s'_{i} \cdot 2^{256 + \ell (p_{r})} \cdot \gamma ' + s'_{i} \cdot x \right) \bmod N'\right) - \zeta \cdot 2^{1848}\right) \bmod 2^{1880}\\ =&\left( \left( \left( s'_{i} \cdot 2^{256 + \ell (p_{r})} \cdot \gamma ' + s'_{i} \cdot (2^{\ell (p_{r})} \cdot n' + p_{r})\right) \bmod N'\right) - \zeta \cdot 2^{1848}\right) \bmod 2^{1880} \end{aligned}$$

where the \(s_{i}'\) are again chosen randomly and we collect \(s_{i}'\) for \(i = 0,\ldots , \mu '-1\) where the bits in the header position match \(\zeta \). We discuss how to choose \(s_{i}'\) and \(\mu '\) below. Thus, we assume that \(d_{i} < 2^{1848}\) for \(s_{i}'\). Information theoretically, each such inequality leaks 32 bits. Considering that \(x = 2^{\ell (p_{r})} n' + p_{r}\) has \(256 + \ell (p_{r})\) bits, we thus require at least \((256 + \ell (p_{r}))/32\) such inequalities to recover \(x\).Footnote 40 Yet, \(\ell (p_{r}) \gg 256\) and the content of \(p_{r}\) is of no interest to us, i.e. we seek to recover \(n'\) without “wasting entropy” on \(p_{r}\).Footnote 41 In other words, we wish to pick \(s'_{i}\) sufficiently large so that all bits of \(s'_{i} \cdot 2^{\ell (p_{r})} \cdot n'\) affect the 32 bits starting at \(2^{1848}\) but sufficiently small to still allow us to consider “most of” \(s'_{i} \cdot p_{r}\) as part of the lower-order bit noise. Thus, we pick random \(s'_{i} \approx 2^{1848-\ell (p_{r})}\) and consider \(d'_{i} :=d_{i} - s'_{i} \cdot p_{r}\) with

$$\begin{aligned} d'_{i}&= \left( \left( \left( s'_{i} \cdot 2^{256 + \ell (p_{r})} \cdot \gamma ' + s'_{i} \cdot 2^{\ell (p_{r})} \cdot n'\right) \bmod N'\right) - \zeta \cdot 2^{1848}\right) \\&\bmod 2^{1880}\\&= \left( s'_{i} \cdot 2^{256 + \ell (p_{r})} \cdot \gamma ' + s'_{i} \cdot 2^{\ell (p_{r})} \cdot n' - \zeta \cdot 2^{1848} - \sigma '_{i} \cdot 2^{1880}\right) \bmod N'. \end{aligned}$$

Writing

$$ t'_{i} = \left( s'_{i} \cdot 2^{256 + \ell (p_{r})} \cdot \gamma ' - \zeta \cdot 2^{1848}\right) \bmod N', $$

we consider the lattice spanned by the rows of \(L_{2}\) with

$$\begin{aligned} L_{2} :=\begin{pmatrix} 2^{1592} & 0 & 0 & 0 & 2^{\ell (p_{r})} \cdot s'_{0} & \cdots & 2^{\ell (p_{r})} \cdot s'_{\mu '-1} & 0\\ 0 & 2^{1688}& 0 & 0 & 2^{1880} & \cdots & 0 & 0\\ 0 & 0 & \ddots & 0 & 0 & \ddots & 0 & 0\\ 0 & 0 & 0 & 2^{1688}& 0 & \cdots & 2^{1880} & 0\\ 0 & 0 & 0 & 0 & N' & \cdots & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & \ddots & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & \cdots & N' & 0\\ 0 & 0 & 0 & 0 & t'_{0} & \cdots & t'_{\mu '-1} & 2^{1848}\\ \end{pmatrix}. \end{aligned}$$

As before, multiplying \(L_{2}\) from the left by

$$ (n',\ -\sigma '_{0},\ \ldots ,\ -\sigma '_{\mu '-1},\ *, \ldots , *, 1) $$

shows that this lattice contains a vector

$$\begin{aligned} (2^{1592}\cdot n',\ -2^{1688}\, \sigma '_{0},\ \ldots ,\ -2^{1688}\, \sigma '_{\mu '-1},\ d'_{0}, \ldots ,\ d'_{\mu '-1},\ 2^{1848}) \end{aligned}$$

where all entries are \(\approx 2^{1848}\) and thus has Euclidean norm \(\approx \sqrt{2\,\mu '+2} \cdot 2^{1848}\). We write “\(\approx \)” instead of “\(\le \)” because \(s'_{i} \cdot p_{r}\) may overflow \(2^{1848}\). Picking \(\mu ' = 256/32 + 1 = 9\) gives an instance where the target vector is expected to be shorter than the Gaussian heuristic predicts. However, due to our choice of \(s'_{i}\), finding a shortest vector might not recover \(n'\) exactly but only the top \(256-\varepsilon \) bits for some small \(\varepsilon \). We verified this behaviour with our proof of concept implementation which consistently recovers all but \(\varepsilon \approx 4\) bits. To recover the remaining bits, we simply perform exhaustive search by computing \(\textsf{SHA}-\textsf{1}(N,p,q,n,s,n' + \varDelta n')\) for all candidates for \(\varDelta n'\) and comparing against \(h_{r}\). Overall, under our assumptions, using \(\approx (6+9) \cdot 2^{32}\) noise-free queries and a trivial amount of computation we can recover \(n'\) from Telegram’s key exchange. This in turn allows to compute the initial salt. Of course, timing side channels are noisy, suggesting a potentially significantly larger number of queries would be needed to recover sufficiently clean signals for the lattice reduction stage.

Extension to other leakage patterns. Our approach can be adapted to check other leakage patterns, e.g. targeting the values in the \(\mathcal {L}(\cdot )\) fields. For example, recall that the Telegram servers require \(1 \le \mathcal {L}(N) \le 16\). We do not know what the servers do when this condition is violated, but discuss possible behaviours:

  • Assume the code terminates early, skipping the \(\textsf{SHA}-\textsf{1}\) call. This would result in a timing side channel leaking that the three most significant bits of \(\mathcal {L}(N)\) are zero when the \(\textsf{SHA}-\textsf{1}\) call is triggered.

  • Assume the code does not terminate early but the Telegram servers feed between 88 and 104 bytes to \(\textsf{SHA}-\textsf{1}\). This would not produce a timing leak. That is, \(\textsf{SHA}-\textsf{1}\) hashes data in blocks with its running time depending on the number of blocks processed. It has a block size of 64 bytes, and its padding algorithm (i.e. see algorithm \(\textsf{SHA}-\textsf{pad}\) in Sect. 2.2) insists on adding at least 8 bytes of length and 1 byte of padding. Thus up to 55 full bytes are hashed as one block, then 119, 183, and 247, cf. [6, 44] for works exploiting this. Telegram’s format checking restricts accepted length to between 88 and 104 bytes, i.e. all valid payloads lead to calls to the \(\textsf{SHA}-\textsf{1}\) compression function on two blocks.

  • Assume the code performs a dummy \(\textsf{SHA}-\textsf{1}\) call on all data received, say, minus the received digest. This would lead to calls to the \(\textsf{SHA}-\textsf{1}\) compression function on three blocks and a timing side channel leaking the three most significant bits of \(\mathcal {L}(N)\), by distinguishing between \(\mathcal {L}(N) > 16\) and \(\mathcal {L}(N) \le 16\).

Now, suppose Telegram’s servers do leak whether the three most significant bits of \(\mathcal {L}(N)\) are zero without first checking the header. On the one hand, this would reduce the query complexity because the target event is now expected to happen with probability \(2^{-3}\). On the other hand, this increases the cost of lattice reduction, as we now need to find shortest vectors in lattices of larger dimension. Information theoretically, we need at least \(m = 160/3\) samples to recover \(h_{r}\) and thus need to consider finding shortest vectors in a lattice of dimension 110, which is feasible [2]. For \(n'\), we can use the same tactic as above for “slicing up” \(x\) into \(n'\) and \(p_{r}\) to slice up \(n'\) into sufficiently small chunks. Alternatively, noting that we only need to recover 64 bits of \(n'\) we can simply consider a lattice of dimension \(\approx 45\), where finding shortest vectors is easy.

7.2 Recovering the session ID

Given the salt, we can recover the session ID using a simple guess and verify approach exploiting the same timing side channel as in Sect. 6. Here, we simply run our attack from Sect. 6 but this time we use a known plaintext block \(m_i\) in order to validate our guesses about the value of \(m_1\) (which is now partially unknown). That is, for all \(2^{64}\) choices of the session ID, and given the recovered salt value, we can construct a candidate for \(m_1\). Then for known \(m_{i-1}, m_{i}\), we construct \(c_{1}~|~c^{\star }\) as before, with \(c^{\star } = m_{i-1} \oplus c_i \oplus m_1\). If our guess for the session ID was correct, then decrypting \(c_{1}~|~c^{\star }\) results in a plaintext having a second block of the form:

$$ m^{\star } = E_{K}^{-1}(c^{\star } \oplus m_1) \oplus c_1 = E_{K}^{-1}(m_{i-1} \oplus c_i) \oplus c_1 = m_i \oplus c_{i-1} \oplus c_1. $$

We can then check if the observed behaviour on processing the ciphertext is consistent with the known value \(m_i \oplus c_{i-1} \oplus c_1\). If our choice of the session ID (and therefore \(m_{1}\)) is correct, this will always be the case. If our guess is incorrect then \(m^{\star }\) can be assumed to be uniformly random.

In more detail, assume our timing side channel leaks 32 bits of plaintext from the length field check. Let \(m_{i}^{{(j)}}\) and \(c_{i}^{(j)}\) be the \(i\)-th block in the \(j\)-th plaintext and ciphertext, respectively. Collect three plaintext-ciphertext pairs such that

$$ m_i^{{(j)}} \oplus c_{i-1}^{(j)} \oplus c_1^{(j)},\ (0 \le j < 3) $$

passes the length check.Footnote 42 For each guess of the session ID submit three ciphertexts containing \(c^{\star ,(j)} = m_{i-1}^{(j)} \oplus c_i^{(j)} \oplus m_1^{(j)}\) as the second block. If our guess for \(m_{1}\) was correct then all three will pass the length check which is leaked to us by the timing side channel. If our guess for \(m_1\) was incorrect then \(E_{K}^{-1}(c^{\star ,(j)} \oplus m_1) \) will output a random block, i.e. such that \(E_{K}^{-1}(c^{\star ,(j)} \oplus m_1) \oplus c_1\) passes the length check with probability \(2^{-32}\). Thus, all three length checks will pass with probability \(2^{-96}\). In other words, the probability of a false positive is upper-bounded by \(2^{64} \cdot 2^{-96} = 2^{-32}\) (i.e. in the worst case we will check and discard \(2^{64} - 1\) possible values of session ID before finding the correct one).

7.3 Breaking server authentication

Recall from Fig. 52 that the \(\textsf{key}, \textsf{iv} \) pair used to encrypt \(g^{a}\) and \(g^{b}\) are derived from \(s\) (sent in the clear) and \(n'\). Since the attack in Sect. 7.1 recovers \(n'\), it can be immediately extended into an attacker-in-the-middle (MitM) attack on the Diffie–Hellman key exchange. That is, knowing \(n'\) the attacker can compose the appropriate IGE ciphertext containing some \(g^{a'}\) of its choice where it knows \(a'\) (and similarly replace \(g^{b}\) coming from the client with \(g^{b'}\) for some \(b'\) it knows). Both client and server will thus complete their respective key exchanges with the adversary rather than each other, allowing the adversary to break confidentiality and integrity of their communication. However, even in the presence of the side channel that enabled the attack in Sect. 7.1, the MitM attack is more complicated due to the need to complete it before the session between client and server times out. This may be feasible under some of the alternative leakage patterns discussed earlier but unlikely to be realistic when \(> 2^{32}\) requests are required to recover \(n'\).

8 Discussion

The central result of this work is a proof that the use of symmetric encryption in Telegram’s MTProto 2.0 can provide the basic security expected from a bidirectional channel if small modifications are made. The Telegram developers have indicated that they implemented most of these changes. Thus, our work can give some assurance to those reliant on Telegram providing confidential and integrity-protected cloud chats—at a comparable level to chat protocols that run over TLS’s record protocol. However, our work comes with a host of caveats.

Attacks. Our work also presents attacks against the symmetric encryption in Telegram. These highlight the gap between the variant of MTProto 2.0 that we specify and Telegram’s implementations. While the reordering attack in Sect. 4.2 and the attack on re-encryption in Sect. 4.2 were possible against implementations that we studied, they can easily be avoided without making changes to the on-the-wire format of MTProto, i.e. by only changing processing in clients and servers. After disclosing our findings, Telegram informed us that they have changed this processing accordingly.

Our attacks in Sect. 6 are attacks on the implementation. As such, they can be considered outside the model: our model only shows that there can be secure instantiations of MTProto but does not cover the actual implementations; in particular, we do not model timing differences. That said, protocol design has a significant impact on the ease with which secure implementations can be achieved. Here, the decision in MTProto to adopt Encrypt & MAC results in the potential for a leak that we can exploit in specific implementations. This “brittleness” of MTProto is of particular relevance due to the surfeit of implementations of the protocol, and the fact that security advice may not be heeded by all authors, as we showed with our \(\textsf {msg}\_\textsf {length} \) attack in Sect. 6. Here Telegram’s apparent ambition to provide TDLib as a one-stop solution for clients across platforms will allow security researchers to focus their efforts. We thus recommend that Telegram replaces the low-level cryptographic processing in all official clients with a carefully vetted library.

Note that the security of the Telegram ecosystem does not stop with official clients. As the recent work of [9] shows, many third-party client implementations are also vulnerable to attacks.

Tightness. On the other hand, our proofs are not necessarily tight. That is, our theorem statements contain terms bounding the advantage by \(\approx q/2^{64}\) where \(q\) is the number of queries sent by the adversary. Yet, we have no attacks matching these bounds (our attacks with complexity \(2^{64}\) are outside the model). Thus, it is possible that a refined analysis would yield tighter bounds.

Future work. Our attack in Sect. 7 is against the implementation of Telegram’s key exchange and is thus outside of our model for two reasons: as before, we do not consider timing side channels in our model and, critically, we only specify the symmetric part of MTProto. This highlights a second significant caveat for our results that large parts of Telegram’s design remain unstudied: multi-user security, the key exchange, the higher-level message processing, secret chats, forward secrecy, control messages, bot APIs, CDNs, cloud storage, the Passport feature, to name but a few. These are pressing topics for future work.

Assumptions. In our proofs we are forced to rely on unstudied assumptions about the underlying primitives used in MTProto. In particular, we have to make related-key assumptions about the compression function of \(\textsf{SHA}-\textsf{256}\) which could be easily avoided by tweaking the use of these primitives in MTProto. In the meantime, these assumptions represent interesting targets for symmetric cryptography research. Similarly, the complexity of our proofs and assumptions largely derives from MTProto deploying hash functions in place of (domain-separated) PRFs such as HMAC. We recommend that Telegram either adopts well-studied primitives for future versions of MTProto to ease analysis and thus to increase confidence in the design, or adopts TLS.

Telegram. While we prove security of the symmetric part of MTProto at a protocol level, we recall that by default communication via Telegram must trust the Telegram servers, i.e. end-to-end encryption is optional and not available for group chats. We thus, on the one hand, (a) recommend that Telegram open-sources the cryptographic processing on their servers and (b) recommend to avoid referencing Telegram as an “encrypted messenger” which—post-Snowden—has come to mean end-to-end encryption. On the other hand, discussions about end-to-end encryption aside, echoing [1, 25] we note that many higher-risk users do rely on MTProto and Telegram and shun Signal. This emphasises the need to study these technologies and how they serve those who rely on them.