The Future of File Utilities: encryption, hashing and compression in the browser

I was wondering if the time has come where browser tech is secure and powerful enough to replace some of the niche command line tools that we are used to in Linux. Normally I’d have to download some variant of them on each operating system or device I’m working on. This can get annoying. Some of the most common niche tools are encryption, hashing and compression of files.

Turns out that the browser is more than capable to deliver this funcitonality thanks to some awesome open source projects. Imagine in the future having a plethora of file tools implemented in the browser: file editor, diff, hex dump, hex edit, all kinds of compression, encryption, hashing, search and replace of text and so on. We can potentially pipe the output of these tools to third party APIs to save the files on cloud storage or process them further. This can open a new era for file tools. And the best part is its all cross-platform. The tools can run uniformly on any operating system and any device that supports a major browser.

File Utilities

After my experience with building SecureMyFiles, a client-side encryption product for desktop operating systems, I got curious if we can achieve similar functionality directly in the browser. My main requirement was that all file operations must be performed on the client side with no data being transferred to a server. The idea is to use the bowser as a cross-platform execution engine to run the file utilities on any operating system, including mobile devices. This way we can avoid downloading specialized tools as we move from device to device.

I’ve open sourced the tools on GitHub:

File Encryption in the Browser

I was able to create an efficient file encryption and decryption utility in the browser that runs completely on the client side. Both encryption and decryption are efficient and run in constant memory achieving 9MB/s. They don’t match the performance of their command-line alternatives but given the fact that they can run on most devices that can run a browser I think it is a good start.

The Javascript library I decieded to use is forge. The need to run with constant memory regardless of the file size requires a cipher that can be updated in chunks. The forge implementation worked quite well. Unfortunately the native Web Crypto API does not support chunked based encryption which was disappointing.

The encryption algorithm I use is AES-GCM with 12 byte IV and 128 bit tag. The key is derived from a password and a randomly generated 128 bit salt passed through PBKDF2 with 100K iterations. The IV is initialized to all 0s (deterministic construction). It is never re-used and each encryption generates a new key. The structure of the encrypted file is:

| 128 bit salt | encrypted file contents | 128 bit tag |

File Encryptor on GitHub:

File Decryptor on GitHub:

File Hashing in the Browser

The file hasher produces SHA-256 hash. It also uses the forge crypto library. It operates on 1MB chunks and can read files with arbitrary size using constant memory. The speed is around 6 times slower than the command line utility sha256sum on my MacBook Pro. But nevertheless it works great in the browser.

File Hasher on GitHub:

File Compression

I decided to use the pako compression library. It achieves good speed and uses the zlib compression algorithm. I tested compressing a 34GB file (VirtualBox vdi file). It took some time but finished successfully on my MacBook Pro with 16GB of RAM. The file format is not pure gzip which is unfortunate since other tools cannot be used to decompress the file.

File Compressor on GitHub:

File Decompressor on GitHub:

What’s next

Why not compress and encrypt at the same time? Why not compress, encrypt and push to Dropbox using their API? This way I have my file ready to be shared with whoever I want. What other use cases can you think of?


Considerations when using AES-GCM for encrypting files

My aim with this post is to provide my research on using AES-GCM correctly. It has multiple gotchas that one should be aware of. I present recommendations for how to use AES-GCM securely to achieve your authenticated encryption requirements when encrypting files on disk.


AES-GCM is an Authenticated Encryption (AE) mode of operation that is built on top of the standardized AES block cipher. It provides confidentiality, integrity, and authenticity assurances on the data, where the decryption operation is combined in a single step with integrity verification. The need for AE emerged from the observation that securely combining a confidentiality mode with an authentication mode could be error prone and difficult. This was confirmed by a number of practical attacks introduced into production protocols and applications by incorrect implementation, or lack, of authentication.

Refer to the original specification of GCM for further details on AES-GCM.

Why would you consider AES-GCM for file encryption

Most importantly, AES-GCM is standardized by NIST. As such, chip manufacturers, like Intel, have provided hardware acceleration for the mode, making it one of the fastest encryption modes available. Also, many standards and products have included support for AES-GCM such as TLS v1.2, IPSec, OpenVPN etc. The GCM construct is fully parallelizable which can significantly increase performance for encryption and decryption. The mode is considered “on-line” because the size of the processed data doesn’t need to be known in advance. This allows for parallel encrypted data streaming which other encryption modes of operation can’t do.

For a more in-depth overview of the benefits of AES-GCM please refer to Phillip Rogaway’s paper: Evaluation of Some Blockcipher Modes of Operation, McGrew and Viega’s GCM specification and the NIST standard itself.

Recommended security parameters for file encryption on disk

This is the optimal recommended usage for the specified GCM parameters when encrypting files of different size on disk. My goal is to optimize the key usage before there is a need to re-generate the key or the initialization vector (IV).

    • Key Size: 256 bits
    • IV Size: 96 bits
      • Use Deterministic IV generation (see below)
    • Tag Length: 128 bits
    • Maximum Encrypted Plaintext Size: ≤ 239 – 256 bits
  • Maximum Processed Additional Authenticated Data: ≤ 264 – 1 bits

Make sure to process less than the maximum encrypted plaintext size. Otherwise you’ll risk complete compromise of confidentiality and the attacker will be able to find the encryption key. The same is true for the maximum processed additional authentication data where the attacker will be able to find the authentication key H which can lead to compromised authenticity.

Creating the correct Initialization Vector (IV)

The IV is authenticated, and it is not necessary to include it in the additional authenticated data (AAD) field. This has been determined by the GCM specification.

Based on the requirements for the IV, if you’re generating a brand new, “fresh” key for each file you encrypt, then the IV can be actually 0, or completely deterministic. In other words the IV can be a 96 bit counter initialized to 0. But you have to guarantee that no key will be reused. A “fresh” key should be used for encrypting one file only, and no file should be re-encrypted with the same key. Instead, a new fresh key should be generated for each re-encryption. Reusing key/IV pair will result in a complete compromise of AES-GCM security.

There are real benefits in using 0 based IV. First, you don’t have to save the IV anywhere which improves space (by a small factor and based on the number of files encrypted). Second, you don’t have to worry about crafting a correct IV since it is static, as long as your keys are “fresh” and never reused. Third, there is no additional entropy requirements for IV generation. This is important, because it leaves all the available entropy to the generated keys.

I have ran this through the crypto community at StackExchange and they have confirmed that this construct is valid and cryptographically secure.

If an IV of a static counter initialized to 0 is not ideal for your situation, then NIST specifies two ways of creating valid IVs: deterministic and randomly generated.

Deterministic IV

The IV is composed of two fields: fixed field of 32 bits and invocation field of 64 bits.

Use the first 32 bits of the IV as a context identifier. For example device identifier. Each context should be different if the key is reused. If a new key is generated for each encrypted file then the context can be static, because key/IV reuse will be unlikely and will depend on generating a duplicate encryption key.

Use the rest 64 bits as a counter initialized to 0. The counter should be incremented for each encrypted plaintext block. The underlying cryptographic library should handle incrementing the IV counter.

Random IV

Generate a random 96 bit IV from a CSPRNG (cryptographically secure random number generator). The same key can be reused for encryption as long as a new IV is generated for each new encryption operation and the IV is guaranteed to be unique. This mode is not recommended to be used, because its security is harder to prove. When in doubt use the deterministic construction.

Tag considerations

There is no need to use a tag smaller than 128 bits when encrypting files on disk. Using a tag smaller than 128 bits will decrease the amount of additional authenticated data (AAD) and plaintext data that can be processed. The maximum length of the processed AAD for 128 bit tags should be limited to 264 – 1. This limit is large enough where it should not pose any practical limitations. Remember, the hard limit of the invocation of the encryption operation is 2^32 – 2 number of operations and it needs to be satisfied first.

Key considerations

I originally discussed the key considerations for AES-GCM here. I haven’t found a good evaluation of the maximum key use for the AES-GCM mode of operation and I had to do my own analysis to determine the security parameters for the keys. If you think that I have a mistake in my analysis, please let me know and I’ll correct it.

In summary, I have found that using the NIST recommendations of processing 239 – 256 bits of plaintext for a single 256 bit key, 128 bit tag and 96 bit IV, should provide sufficient security margin. No further data chunking should be required. This allows the encryption of files of up to ~64GB in size before the need of regenerating the key (if you’re using a 0 based IV) or the IV (if you want to reuse the key).

Security Warning

No key/IV pairs should ever be reused when using AES-GCM. All used keys should be “fresh” or the IVs should be unique if the same key is used. The IV is required to be a NONCE (number used once) and not necessarily random. This is so important for the GCM construct, that a single repeated NONCE can lead to a complete compromise of the authenticity of the data. For this reason NIST has published special recommendations for using AES-GCM correctly. You should familiarize yourself with them before using AES-GCM in your projects.

If you need to reuse the same key, the maximum number of NONCEs you can generate to achieve 264 security is 228.5. This is based on a comment I found on Stack Exchange.

For some practical disadvantages of AES-GCM please see this Stack Exchange answer.