3. Security Considerations | ML-KEM Braid

3.1 The Vulnerable Message Set

This protocol is designed to provide robust continuous key agreement in the presence of bandwidth limits. Since many messages must be passed in order to reach key agreement, the important measure of security provided by this protocol is "in the event of a compromise, how many messages are passed before healing?". We call this the vulnerable message set.

The size of the vulnerable message set is not an intrinsic property of the protocol. For the ML-KEM Braid, given a chunk size and choice of KEM one can compute the minimum possible size of the vulnerable message set, but the maximum size of this set is unbounded: if Bob never replies to Alice but Alice continues sending messages, their session will never heal and all of Alice's messages will be vulnerable.

In fact, as shown in [3], the size of the vulnerable message set for a SCKA protocol depends on the message sending behavior of the protocol participants. Two parties that are online and in rapid conversation, such as two parties chatting using their primary devices in a Signal chat, will typically have a smaller vulnerable message set than will be found in a conversation between two parties on desktop devices that are often offline.

The ML-KEM Braid protocol was selected to provide small vulnerable message sets in a wide range of realistic secure messaging scenarios, but for applications with highly specific message sending behaviors protocol designers should consider whether a different SCKA protocol may provide better security.

3.2 Alternate KEMs

This protocol uses ML-KEM as specified but we note that the IND-CCA security provided by ML-KEM's Fujisaki-Okamoto transform is not required to prove the security of this or related SCKA protocols. In [8] an IND-CPA "Ratcheting KEM" was designed with this fact in mind, and it allows a large part of a ciphertext to be reused as a public key for a future round, reducing bandwidth costs. Even without the ratcheting KEM optimization, this protocol could be made more efficient by having the parties use their shared session state to determine the encapsulation key seed, and then using the internal IND-CPA Public Key Encryption (PKE) functionality of ML-KEM. This would not only reduce message sizes by a small amount, it would allow us to skip sending the "header" and reduce the number of round trips required to emit a key. This could have a particularly large benefit in situations where communication is imbalanced, for example, when one party's device is offline for long periods of time.

Alternate KEMs, or more generally IND-CPA PKE schemes designed for efficient ratcheting, may not have the same binding properties as ML-KEM. As seen in [9], [10], these binding properties can have an impact on the security of higher level protocols. Developers using a variant of this protocol with alternate KEM should consider the security implications of these binding properties in their higher level protocol.

3.3 Optional internal authentication

The authenticator objects and the MACs that are added to header and ciphertext messages provide standalone authenticity guarantees for ML-KEM Braid messages and outputs at a cost of transmitting 64 bytes per epoch.

If this protocol is integrated into a higher level protocol, such as the Double Ratchet [2], that provides authentication, then protocol messages can derive authenticity from that and the internal authentication of the ML-KEM Braid protocol could be removed.

3.4 Bandwidth limits, message sizes, and speed of PCS

While the size of the vulnerable message set depends on the actual message sending behavior of the protocol participants, it is easy to see that if we send larger chunks, then key agreement - and eventual healing - will happen faster.

With a chunk size of 32, using ML-KEM 768 we require 3 messages to send a header, 30 messages to send ct1, 36 messages to send ek_vector, and 5 messages to send ct2 with its MAC. If we doubled the chunk size to 64, these numbers will be cut in half (with upward rounding) and, under ideal conditions, healing would occur almost twice as fast. We note, though, that when parties go offline for periods of time, as is common with Signal's linked devices, the benefits of larger chunks become less pronounced, as is seen in [3]. A doubling of the bandwidth limit will significantly improve the speed of PCS healing in many realistic settings but it will not lead to a doubling of healing speed. When considering the tradeoff between bandwidth usage and PCS speed, both client requirements and client message sending behavior should be considered.

In the presence of strict bandwidth limits, note that if a compact binary format is used to encode protocol messages and this format saves several bytes over a general purpose format, then those bytes can be used to send larger chunks and speed PCS.

3.5 Encoder domain size

While in principle we think of encoders as producing an unbounded stream of codewords, in practice the encoder we recommend produces a finite number of distinct codewords before repeating. This means that an attacker with network control does not need to perform a complete denial of service attack on two parties to prevent the protocol from advancing: eventually codewords must be repeated and they can let any messages that repeat codewords through.

Internally, this is due to the fact that the encoding is implemented using polynomial interpolation over a finite field, and the number of distinct codewords is equal to the size of the underlying field.

We recommend using GF(2^16) as the underlying field. When using ML-KEM 768 [1] as the underlying KEM and 32-byte codewords, the largest message sent by the protocol is 36 codewords long. Thus to prevent ratcheting, an attacker must prevent 2^16 - 35 out of every 2^16 messages from being delivered. While this is not a complete denial of service, it requires blocking over 99.9% of messages and will likely cause protocol users to consider the underlying service to be unavailable.

Using GF(2^8) would allow faster encoding and decoding, but now would allow an attacker to prevent the protocol from advancing by blocking 221 out of 256 messages, or 86% of messages. This will likely lead to a poor user experience, but may still render a higher level protocol usable. If encoding and decoding speed are critical for an application, we recommend considering the use of fountain codes as described in the next section.

3.6 Alternate encoders

We recommend using a systematic erasure code based encoding scheme to split large messages into fixed size chunks. Systematic encodings make the common case where no messages are dropped very efficient, since decoding is simply concatenation. Erasure codes give us a guarantee that, if a message's plaintext fits into N codewords, then when the recipient receives any N codewords they will be able to decode the message.

If the computational costs of erasure code decoding are too high, a fountain code, such as RaptorQ [11], can be used. With a fountain code the recipient loses the guarantee that if a message's plaintext fits into N codewords, then when the recipient receives any N codewords they will be able to decode the message. Nevertheless, the recipient will still be able to decode the message from N codewords in most situations. For RaptorQ, for example, decoders must succeed in decoding from N codewords at least 99 out of 100 times, and succeed decoding from N+k codewords with probability at least 1 - 10^(-k-1).

Since four messages must be decoded in each epoch, we can expect an implementation using RaptorQ to have to send more messages in an epoch than an erasure code based implementation about 4% of the time.

This may have a negative impact on the security of any higher level protocol. For example, in a Double Ratchet protocol [2] using this would directly lead to an increase in the size of the Vulnerable Message Set - the set of messages exposed to an attacker in a device compromise - about 4% of the time. (See [3] for more details on the Vulnerable Message Set).

3.7 Formal verification and security proofs

The ML-KEM Braid protocol has been modeled using ProVerif [12] and has been proven, in the Dolev-Yao model, to provide the correctness, Forward Secrecy, and Post-Compromise Security required of an SCKA protocol as defined in [3]. Furthermore these models also prove mutual authentication for the ML-KEM Braid protocol.

The ML-KEM Braid protocol is closely related to the protocol Opp-UniKEM-CKA introduced and proven to be a secure SCKA in [3].

The implementation and ProVerif models are available at [13].

3.8 Representation of epochs

If an implementation uses a fixed width integer to represent the epoch, then eventually the epoch counter will repeat and the protocol loses the property that every epoch produces a unique key. For example, an application that uses 8-bit integers for epochs to conserve space in protocol messages will begin repeating epochs after 256 keys have been emitted - something that is likely to happen in many conversations. Using a 64-bit integer to represent the epoch will prevent this wraparound from ever happening in a human conversation, but for other applications of the ML-KEM Braid this wraparound should be considered.