Cryptography Asked on January 5, 2021
I need to encrypt big files using AES-GCM, potentially 10 GB or more. For memory (RAM) reasons, I need to processs them by blocks (let’s say 16 MB), rather than doing encrypt(plaintext)
in one pass.
By reading the answers of this security.SE question:
Should I chunk or not for AES-GCM file encryption, I have the feeling that I read all and its opposite.
Which one is the correct approach?
Method A: Since "a counter mode converts a block cipher into a stream cipher" (quote from the linked post above), we can do this:
nonce = Random.new().read(16)
out.write(nonce)
cipher = AES.new(key, AES.MODE_GCM, nonce=nonce)
while True:
block = f.read(16*1024*1024)
if not block: # EOF
break
out.write(cipher.encrypt(block)) # we encrypt multiple blocks with the same
# "cipher" object, especially the same nonce
out.write(cipher.digest()) # we compute the auth. tag only once at the end
Here we encrypt multiple 16MB blocks with the same "cipher" object, same nonce.
I read some criticisms about this approach in the article AEADs: getting better at symmetric cryptography, paragraph "AEADs with large plaintexts".
But on the other hand, I noticed that:
print(cipher.encrypt(b'hello')) # 4cadd813be in hexadecimal
print(cipher.encrypt(b'hello')) # d3585e3471, different, fortunately!
so it seems ok (like a stream cipher).
Is it true that GCM (counter mode) converts a block cipher into a stream cipher?
Method B: we have to choose a new nonce
and tag
for each 16 MB:
while True:
block = f.read(16*1024*1024)
if not block: # EOF
break
nonce = Random.new().read(16)
cipher = AES.new(key, AES.MODE_GCM, nonce=nonce)
out.write(nonce)
out.write(cipher.encrypt(block)) # new "cipher" object, new nonce for each 16 MB block
out.write(cipher.digest()) # we compute the auth. tag only once at the end
Drawback with this method: we have to save to disk nonce
, tag
metadata for each block.
This looks like the method detailed in Proper way of encrypting large files with AES-256-GCM. Obviously a malicious user could swap the order of blocks (including their nonce
and tag
) and the file would look ok, whereas it’s not. So this solution seems not ok, as suggested by this answer.
TL;DR: Isn’t it a problem that we use only one nonce
in Method A above?
Note: I also read this method which chains and blocks (and tags).
Is there a general consensus/normalization for a good way to work with big files by blocks with AES-GCM?
(For implementations with Python, such as pycryptodome
, I’ll ask later on SO, but first, I wanted to read about the background).
AES-GCM can encrypt up to $2^{39}-256$ bits with a single key+nonce pair. That's just under 64GiB. A 10GiB file is fine.
If you'd go beyond 64GiB you'll lose security. In that case, either use XChaCha20-Poly1305 (max of 256GiB plaintext per message) or divide the file into chunks < 64GiB.
16MB is far smaller than needed, and will hurt performance.
Most implementations of AES-GCM (Or XChaCha20-Poly1305) will provide some sort of streaming interface, with init
, update
, and finalize
functions, where init
starts the computation, update
takes in some data and can be called repeatedly, and finalize
finishes it. Libsodium's crypto_secretstream_*
(documentation here) is a good example, but any library offering a streaming implementation should have something similar.
Correct answer by SAI Peregrinus on January 5, 2021
Additional detail to @SAIPeregrinus's answer:
At the end, the "method A" above (by blocks) gives exactly the same result than if we did the whole plaintext in one pass:
import Crypto.Random, Crypto.Cipher.AES # using package "pycryptodome"
key = bytes.fromhex('7d29ccf69c671775e17d4b9dd6485fd8')
nonce = bytes.fromhex('04972c7927042af0ee10c7e6ac56ddd3')
# usual method (whole plaintext in one pass)
cipher = Crypto.Cipher.AES.new(key, Crypto.Cipher.AES.MODE_GCM, nonce=nonce)
print(cipher.encrypt(b'hellohelloblablabla').hex()) # e8eed0bf4e10dd882d2a7d4daf377fa05419a5
# method A, by blocks
cipher2 = Crypto.Cipher.AES.new(key, Crypto.Cipher.AES.MODE_GCM, nonce=nonce)
print(cipher2.encrypt(b'hello').hex()) # e8eed0bf4e
print(cipher2.encrypt(b'hello').hex()) # 10dd882d2a
print(cipher2.encrypt(b'blablabla').hex()) # 7d4daf377fa05419a5
# gives exactly the same result
so the fact of writing in chunks (to avoid "Out of memory" error if we read 10 GB in one pass) in method A above has no impact on the result encrypted file.
Answered by Basj on January 5, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP