TransWikia.com

Encrypt a big file in blocks with AES-GCM: how many nonce do we need?

Cryptography Asked on January 5, 2021

I need to encrypt big files using AES-GCM, potentially 10 GB or more. For memory (RAM) reasons, I need to processs them by blocks (let’s say 16 MB), rather than doing encrypt(plaintext) in one pass.

By reading the answers of this security.SE question:
Should I chunk or not for AES-GCM file encryption, I have the feeling that I read all and its opposite.

Which one is the correct approach?

  • Method A: Since "a counter mode converts a block cipher into a stream cipher" (quote from the linked post above), we can do this:

    nonce = Random.new().read(16)
    out.write(nonce)
    cipher = AES.new(key, AES.MODE_GCM, nonce=nonce)
    while True:
        block = f.read(16*1024*1024)
        if not block:  # EOF
            break
        out.write(cipher.encrypt(block))  # we encrypt multiple blocks with the same 
                                          # "cipher" object, especially the same nonce
    out.write(cipher.digest())  # we compute the auth. tag only once at the end
    

    Here we encrypt multiple 16MB blocks with the same "cipher" object, same nonce.
    I read some criticisms about this approach in the article AEADs: getting better at symmetric cryptography, paragraph "AEADs with large plaintexts".

    But on the other hand, I noticed that:

    print(cipher.encrypt(b'hello'))  # 4cadd813be in hexadecimal
    print(cipher.encrypt(b'hello'))  # d3585e3471, different, fortunately!
    

    so it seems ok (like a stream cipher).

    Is it true that GCM (counter mode) converts a block cipher into a stream cipher?

  • Method B: we have to choose a new nonce and tag for each 16 MB:

    while True:
        block = f.read(16*1024*1024)
        if not block:  # EOF
            break
        nonce = Random.new().read(16)
        cipher = AES.new(key, AES.MODE_GCM, nonce=nonce)
        out.write(nonce)
        out.write(cipher.encrypt(block))  # new "cipher" object, new nonce for each 16 MB block
        out.write(cipher.digest())  # we compute the auth. tag only once at the end
    

    Drawback with this method: we have to save to disk nonce, tag metadata for each block.

    This looks like the method detailed in Proper way of encrypting large files with AES-256-GCM. Obviously a malicious user could swap the order of blocks (including their nonce and tag) and the file would look ok, whereas it’s not. So this solution seems not ok, as suggested by this answer.

TL;DR: Isn’t it a problem that we use only one nonce in Method A above?

Note: I also read this method which chains and blocks (and tags).

Is there a general consensus/normalization for a good way to work with big files by blocks with AES-GCM?

(For implementations with Python, such as pycryptodome, I’ll ask later on SO, but first, I wanted to read about the background).

2 Answers

AES-GCM can encrypt up to $2^{39}-256$ bits with a single key+nonce pair. That's just under 64GiB. A 10GiB file is fine.

If you'd go beyond 64GiB you'll lose security. In that case, either use XChaCha20-Poly1305 (max of 256GiB plaintext per message) or divide the file into chunks < 64GiB.

16MB is far smaller than needed, and will hurt performance.

Most implementations of AES-GCM (Or XChaCha20-Poly1305) will provide some sort of streaming interface, with init, update, and finalize functions, where init starts the computation, update takes in some data and can be called repeatedly, and finalize finishes it. Libsodium's crypto_secretstream_* (documentation here) is a good example, but any library offering a streaming implementation should have something similar.

Correct answer by SAI Peregrinus on January 5, 2021

Additional detail to @SAIPeregrinus's answer:

At the end, the "method A" above (by blocks) gives exactly the same result than if we did the whole plaintext in one pass:

import Crypto.Random, Crypto.Cipher.AES  # using package "pycryptodome"

key = bytes.fromhex('7d29ccf69c671775e17d4b9dd6485fd8')
nonce = bytes.fromhex('04972c7927042af0ee10c7e6ac56ddd3')

# usual method (whole plaintext in one pass)
cipher = Crypto.Cipher.AES.new(key, Crypto.Cipher.AES.MODE_GCM, nonce=nonce)
print(cipher.encrypt(b'hellohelloblablabla').hex())      # e8eed0bf4e10dd882d2a7d4daf377fa05419a5

# method A, by blocks
cipher2 = Crypto.Cipher.AES.new(key, Crypto.Cipher.AES.MODE_GCM, nonce=nonce)
print(cipher2.encrypt(b'hello').hex())                     # e8eed0bf4e
print(cipher2.encrypt(b'hello').hex())                     # 10dd882d2a
print(cipher2.encrypt(b'blablabla').hex())                 # 7d4daf377fa05419a5
# gives exactly the same result

so the fact of writing in chunks (to avoid "Out of memory" error if we read 10 GB in one pass) in method A above has no impact on the result encrypted file.

Answered by Basj on January 5, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP