Considerations when using AES-GCM for encrypting files

My aim with this post is to provide my research on using AES-GCM correctly. It has multiple gotchas that one should be aware of. I present recommendations for how to use AES-GCM securely to achieve your authenticated encryption requirements when encrypting files on disk.

Definition

AES-GCM is an Authenticated Encryption (AE) mode of operation that is built on top of the standardized AES block cipher. It provides confidentiality, integrity, and authenticity assurances on the data, where the decryption operation is combined in a single step with integrity verification. The need for AE emerged from the observation that securely combining a confidentiality mode with an authentication mode could be error prone and difficult. This was confirmed by a number of practical attacks introduced into production protocols and applications by incorrect implementation, or lack, of authentication.

Refer to the original specification of GCM for further details on AES-GCM.

Why would you consider AES-GCM for file encryption

Most importantly, AES-GCM is standardized by NIST. As such, chip manufacturers, like Intel, have provided hardware acceleration for the mode, making it one of the fastest encryption modes available. Also, many standards and products have included support for AES-GCM such as TLS v1.2, IPSec, OpenVPN etc. The GCM construct is fully parallelizable which can significantly increase performance for encryption and decryption. The mode is considered “on-line” because the size of the processed data doesn’t need to be known in advance. This allows for parallel encrypted data streaming which other encryption modes of operation can’t do.

For a more in-depth overview of the benefits of AES-GCM please refer to Phillip Rogaway’s paper: Evaluation of Some Blockcipher Modes of Operation, McGrew and Viega’s GCM specification and the NIST standard itself.

Recommended security parameters for file encryption on disk

This is the optimal recommended usage for the specified GCM parameters when encrypting files of different size on disk. My goal is to optimize the key usage before there is a need to re-generate the key or the initialization vector (IV).

    • Key Size: 256 bits
    • IV Size: 96 bits
      • Use Deterministic IV generation (see below)
    • Tag Length: 128 bits
    • Maximum Encrypted Plaintext Size: ≤ 239 – 256 bits
  • Maximum Processed Additional Authenticated Data: ≤ 264 – 1 bits

Make sure to process less than the maximum encrypted plaintext size. Otherwise you’ll risk complete compromise of confidentiality and the attacker will be able to find the encryption key. The same is true for the maximum processed additional authentication data where the attacker will be able to find the authentication key H which can lead to compromised authenticity.

Creating the correct Initialization Vector (IV)

The IV is authenticated, and it is not necessary to include it in the additional authenticated data (AAD) field. This has been determined by the GCM specification.

Based on the requirements for the IV, if you’re generating a brand new, “fresh” key for each file you encrypt, then the IV can be actually 0, or completely deterministic. In other words the IV can be a 96 bit counter initialized to 0. But you have to guarantee that no key will be reused. A “fresh” key should be used for encrypting one file only, and no file should be re-encrypted with the same key. Instead, a new fresh key should be generated for each re-encryption. Reusing key/IV pair will result in a complete compromise of AES-GCM security.

There are real benefits in using 0 based IV. First, you don’t have to save the IV anywhere which improves space (by a small factor and based on the number of files encrypted). Second, you don’t have to worry about crafting a correct IV since it is static, as long as your keys are “fresh” and never reused. Third, there is no additional entropy requirements for IV generation. This is important, because it leaves all the available entropy to the generated keys.

I have ran this through the crypto community at StackExchange and they have confirmed that this construct is valid and cryptographically secure.

If an IV of a static counter initialized to 0 is not ideal for your situation, then NIST specifies two ways of creating valid IVs: deterministic and randomly generated.

Deterministic IV

The IV is composed of two fields: fixed field of 32 bits and invocation field of 64 bits.

Use the first 32 bits of the IV as a context identifier. For example device identifier. Each context should be different if the key is reused. If a new key is generated for each encrypted file then the context can be static, because key/IV reuse will be unlikely and will depend on generating a duplicate encryption key.

Use the rest 64 bits as a counter initialized to 0. The counter should be incremented for each encrypted plaintext block. The underlying cryptographic library should handle incrementing the IV counter.

Random IV

Generate a random 96 bit IV from a CSPRNG (cryptographically secure random number generator). The same key can be reused for encryption as long as a new IV is generated for each new encryption operation and the IV is guaranteed to be unique. This mode is not recommended to be used, because its security is harder to prove. When in doubt use the deterministic construction.

Tag considerations

There is no need to use a tag smaller than 128 bits when encrypting files on disk. Using a tag smaller than 128 bits will decrease the amount of additional authenticated data (AAD) and plaintext data that can be processed. The maximum length of the processed AAD for 128 bit tags should be limited to 264 – 1. This limit is large enough where it should not pose any practical limitations. Remember, the hard limit of the invocation of the encryption operation is 2^32 – 2 number of operations and it needs to be satisfied first.

Key considerations

I originally discussed the key considerations for AES-GCM here. I haven’t found a good evaluation of the maximum key use for the AES-GCM mode of operation and I had to do my own analysis to determine the security parameters for the keys. If you think that I have a mistake in my analysis, please let me know and I’ll correct it.

In summary, I have found that using the NIST recommendations of processing 239 – 256 bits of plaintext for a single 256 bit key, 128 bit tag and 96 bit IV, should provide sufficient security margin. No further data chunking should be required. This allows the encryption of files of up to ~64GB in size before the need of regenerating the key (if you’re using a 0 based IV) or the IV (if you want to reuse the key).

Security Warning

No key/IV pairs should ever be reused when using AES-GCM. All used keys should be “fresh” or the IVs should be unique if the same key is used. The IV is required to be a NONCE (number used once) and not necessarily random. This is so important for the GCM construct, that a single repeated NONCE can lead to a complete compromise of the authenticity of the data. For this reason NIST has published special recommendations for using AES-GCM correctly. You should familiarize yourself with them before using AES-GCM in your projects.

If you need to reuse the same key, the maximum number of NONCEs you can generate to achieve 264 security is 228.5. This is based on a comment I found on Stack Exchange.

For some practical disadvantages of AES-GCM please see this Stack Exchange answer.

Resources

http://csrc.nist.gov/groups/ST/toolkit/BCM/documents/proposedmodes/gcm/gcm-spec.pdf

http://web.cs.ucdavis.edu/~rogaway/papers/modes.pdf

http://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38d.pdf

http://crypto.stackexchange.com/a/44115/44337

http://crypto.stackexchange.com/a/44166/44337

https://eprint.iacr.org/2016/475.pdf

http://crypto.stackexchange.com/a/10808

http://csrc.nist.gov/groups/ST/toolkit/BCM/documents/comments/CWC-GCM/Ferguson2.pdf