The Future of File Utilities: encryption, hashing and compression in the browser

I was wondering if the time has come where browser tech is secure and powerful enough to replace some of the niche command line tools that we are used to in Linux. Normally I’d have to download some variant of them on each operating system or device I’m working on. This can get annoying. Some of the most common niche tools are encryption, hashing and compression of files.

Turns out that the browser is more than capable to deliver this funcitonality thanks to some awesome open source projects. Imagine in the future having a plethora of file tools implemented in the browser: file editor, diff, hex dump, hex edit, all kinds of compression, encryption, hashing, search and replace of text and so on. We can potentially pipe the output of these tools to third party APIs to save the files on cloud storage or process them further. This can open a new era for file tools. And the best part is its all cross-platform. The tools can run uniformly on any operating system and any device that supports a major browser.

File Utilities

After my experience with building SecureMyFiles, a client-side encryption product for desktop operating systems, I got curious if we can achieve similar functionality directly in the browser. My main requirement was that all file operations must be performed on the client side with no data being transferred to a server. The idea is to use the bowser as a cross-platform execution engine to run the file utilities on any operating system, including mobile devices. This way we can avoid downloading specialized tools as we move from device to device.

I’ve open sourced the tools on GitHub: https://github.com/stanimirivanovde/browser-power

File Encryption in the Browser

I was able to create an efficient file encryption and decryption utility in the browser that runs completely on the client side. Both encryption and decryption are efficient and run in constant memory achieving 9MB/s. They don’t match the performance of their command-line alternatives but given the fact that they can run on most devices that can run a browser I think it is a good start.

The Javascript library I decieded to use is forge. The need to run with constant memory regardless of the file size requires a cipher that can be updated in chunks. The forge implementation worked quite well. Unfortunately the native Web Crypto API does not support chunked based encryption which was disappointing.

The encryption algorithm I use is AES-GCM with 12 byte IV and 128 bit tag. The key is derived from a password and a randomly generated 128 bit salt passed through PBKDF2 with 100K iterations. The IV is initialized to all 0s (deterministic construction). It is never re-used and each encryption generates a new key. The structure of the encrypted file is:

| 128 bit salt | encrypted file contents | 128 bit tag |

File Encryptor on GitHub: https://stanimirivanovde.github.io/browser-power/encrypt-file.html

File Decryptor on GitHub: https://stanimirivanovde.github.io/browser-power/decrypt-file.html

File Hashing in the Browser

The file hasher produces SHA-256 hash. It also uses the forge crypto library. It operates on 1MB chunks and can read files with arbitrary size using constant memory. The speed is around 6 times slower than the command line utility sha256sum on my MacBook Pro. But nevertheless it works great in the browser.

File Hasher on GitHub: https://stanimirivanovde.github.io/browser-power/hash-file.html

File Compression

I decided to use the pako compression library. It achieves good speed and uses the zlib compression algorithm. I tested compressing a 34GB file (VirtualBox vdi file). It took some time but finished successfully on my MacBook Pro with 16GB of RAM. The file format is not pure gzip which is unfortunate since other tools cannot be used to decompress the file.

File Compressor on GitHub: https://stanimirivanovde.github.io/browser-power/compress-file.html

File Decompressor on GitHub: https://stanimirivanovde.github.io/browser-power/decompress-file.html

What’s next

Why not compress and encrypt at the same time? Why not compress, encrypt and push to Dropbox using their API? This way I have my file ready to be shared with whoever I want. What other use cases can you think of?

 

Securing Sensitive Files

Sensitive files can be secured using different methods. It is important to understand the pros and cons of each in order to pick the best tool for your needs.

Encrypted Archives

Most archival software has the capability to add a pass phrase for archive protection. Different tools use different encryption algorithms. It is important to also look for authenticated encryption or AEAD. It provides extra integrity verification that ensures the archive hasn’t been tampered with. Some of the tools that support archival encryption are:

  • 7-zip
  • WinZip
  • WinRaR

Summary:

  • Easy to use
  • Widely available
  • Difficult to manage pass phrases for many different archives
  • Most archival software doesn’t support authenticated encryption
  • Can be slow to add new files to an existing large archive

Basic Encryption Tools

Basic encryption tools don’t provide archival capabilities. Their main purpose is to encrypt a file. Each file is encrypted with its own pass phrase or your public key. There is support for file compression. Some of them are cross-platform and run on most operating system.

Summary:

  • Easy to install and use
  • Difficult to manage different pass phrases for each file
  • PGP is complex and requires careful understanding of its configuration
  • Some of the tools in this category don’t support authenticated encryption

Application specific encryption

Some applications provide their own file protection as part of their interface. The protection mechanisms vary between different applications. In the past various applications were having issues protecting files. For example the encryption of Word documents before Office 2003 was insecure. Examples of application specific encryption:

  • Password protected Word, Excel, PowerPoint files
  • Password protected PDFs

Summary:

  • These applications already support encryption so it is convenient to use it out of the box
  • It is hard to manage passwords for different files
  • Some applications provide very weak security by using old and vulnerable algorithms
    • NOTE: The encryption of Word documents before Office 2003 is insecure.
  • There can be version mismatch where a file is encrypted with different version of the software and can’t be opened

Encrypted File Volumes (not optimized for cloud)

Encrypted volumes are much easier to work with. The volume is just a flat file on your disk so you can move it anywhere. When mounted the volume appears as a regular drive on your operating system. You can add/delete/edit files with ease. The user needs a single pass phrase to mount the volume and this applies to all added files. Volumes can be mounted and umounted at any time.

  • VeraCrypt
  • TrueCrypt (deprecated)

Summary:

  • Using a single password to open the volume can simplify working with many encrypted files
  • Easy to use
  • Fixed size of each volume that can’t grow dynamically can be frustrating
  • Not optimized for cloud storage – if a single file changes then the whole contents of the volume needs to be synced to the cloud

Full Drive Encryption (FDE)

Full Drive Encryption is great for preventing a stolen drive or laptop from being accessed without the passphrase. It depends what drive is encrypted. If it is the boot drive then as soon as the drive is booted into the operating system it is no longer protected as a logged-in user has complete access to all the files. The only way to protect your encrypted boot drive is to shutdown your computer. If a partition is encrypted then once it is “mounted” it appears as a drive on your OS. Most modern operating systems already have support for FDE and it is fairly simple to turn on. It does add a pace of mind that your data is protected if you lose your laptop. Some tools allow for hidden partitions that can add additional security to your files.

  • Bit Locker (Windows)
  • Veracrypt (Cross-Platform)
  • File Vault (Mac OS X)
  • dm-crypt (Linux)
  • PGP Full Disk Encryption

Summary:

  • Easy to use
  • Doesn’t protect files that are copied out of the encrypted disk
  • Doesn’t protect files on mounted volumes for the logged-in user
  • Not applicable to cloud storage as it only works on the physical disk or partition

Encrypted Mounted Volumes (optimized for cloud)

Encryption tools that are optimized for the cloud provide many additional benefits over the previously discussed tools. First, they are optimized for synchronization to cloud storage. Only the changed parts of the encrypted files are uploaded to the cloud. Second, they are using modern security algorithms such as authenticated encryption and public key cryptography. They allow managing and sharing encrypted files with ease. They work with different cloud storage providers so you’re not locked into one of them. They also provide local file protection.

  • BoxCryptor
  • Cryptomator
  • GoCryptFs
  • KeybaseFS

Summary:

  • State of the art encryption
  • Cloud integration
  • Efficient network synchronization
  • Easy to use
  • Can support large files
  • Allow easy password management and file sharing using public key encryption
  • Protect local and cloud files

Secure Cloud Providers

Secure Cloud Providers can offer various encryption mechanisms. The best ones use client-side encryption that protects your files before they are uploaded to the cloud. But the files are not encrypted on the client computer but rather only when uploaded to the cloud provider. Once the files are on the cloud server they stays encrypted at rest.

  • Sync.com
  • pCloud
  • Tresorit
  • ShareFile
  • NextCloud

Summary:

  • Provide backup capabilities as well as encryption
  • Can be used to share files securely
  • Can be used to synchronize files across multiple devices
  • Requires a separate account
  • Doesn’t work with existing cloud providers such as Dropbox or Google Drive
  • Force you to purchase secure file storage from them
  • Don’t encrypt the files on the client computer

Secure File Sharing

Secure File Sharing can take the form of a secure cloud storage provider or as a specialized tool aimed directly at file sharing. Different tools provide different capabilities.

  • Firefox Send (depricated)
  • Citrix ShareFile
  • Signal
  • Wire

Summary:

  • Can be used to easily and securely send files to other users
  • Some of them have size limitations
  • Additional complexity in managing users
  • Don’t provide encryption at rest

Secure e-mail

Securing email is hard. It is an old protocol that doesn’t natively support encryption. The best way to secure your emails is to use an extra mail plugin that runs in Outlook or Thunderbird or use a secure email provider. Nowadays secure email providers provide much better user experience with many options to secure your messages.

  • ProtonMail
  • PGP plugin for Outlook/Thunderbird
  • Hushmail
  • Mailfence
  • Tutanota
  • S/MIME

Summary:

  • Popular mean for sharing files
  • Can encrypt the body and attachments of emails
  • The password for decrypting the file has to be communicated using another method (out of band)
  • The user needs to have the same tools in order to decrypt the message

Considerations when using AES-GCM for encrypting files

My aim with this post is to provide my research on using AES-GCM correctly. It has multiple gotchas that one should be aware of. I present recommendations for how to use AES-GCM securely to achieve your authenticated encryption requirements when encrypting files on disk.

Definition

AES-GCM is an Authenticated Encryption (AE) mode of operation that is built on top of the standardized AES block cipher. It provides confidentiality, integrity, and authenticity assurances on the data, where the decryption operation is combined in a single step with integrity verification. The need for AE emerged from the observation that securely combining a confidentiality mode with an authentication mode could be error prone and difficult. This was confirmed by a number of practical attacks introduced into production protocols and applications by incorrect implementation, or lack, of authentication.

Refer to the original specification of GCM for further details on AES-GCM.

Why would you consider AES-GCM for file encryption

Most importantly, AES-GCM is standardized by NIST. As such, chip manufacturers, like Intel, have provided hardware acceleration for the mode, making it one of the fastest encryption modes available. Also, many standards and products have included support for AES-GCM such as TLS v1.2, IPSec, OpenVPN etc. The GCM construct is fully parallelizable which can significantly increase performance for encryption and decryption. The mode is considered “on-line” because the size of the processed data doesn’t need to be known in advance. This allows for parallel encrypted data streaming which other encryption modes of operation can’t do.

For a more in-depth overview of the benefits of AES-GCM please refer to Phillip Rogaway’s paper: Evaluation of Some Blockcipher Modes of Operation, McGrew and Viega’s GCM specification and the NIST standard itself.

Recommended security parameters for file encryption on disk

This is the optimal recommended usage for the specified GCM parameters when encrypting files of different size on disk. My goal is to optimize the key usage before there is a need to re-generate the key or the initialization vector (IV).

    • Key Size: 256 bits
    • IV Size: 96 bits
      • Use Deterministic IV generation (see below)
    • Tag Length: 128 bits
    • Maximum Encrypted Plaintext Size: ≤ 239 – 256 bits
  • Maximum Processed Additional Authenticated Data: ≤ 264 – 1 bits

Make sure to process less than the maximum encrypted plaintext size. Otherwise you’ll risk complete compromise of confidentiality and the attacker will be able to find the encryption key. The same is true for the maximum processed additional authentication data where the attacker will be able to find the authentication key H which can lead to compromised authenticity.

Creating the correct Initialization Vector (IV)

The IV is authenticated, and it is not necessary to include it in the additional authenticated data (AAD) field. This has been determined by the GCM specification.

Based on the requirements for the IV, if you’re generating a brand new, “fresh” key for each file you encrypt, then the IV can be actually 0, or completely deterministic. In other words the IV can be a 96 bit counter initialized to 0. But you have to guarantee that no key will be reused. A “fresh” key should be used for encrypting one file only, and no file should be re-encrypted with the same key. Instead, a new fresh key should be generated for each re-encryption. Reusing key/IV pair will result in a complete compromise of AES-GCM security.

There are real benefits in using 0 based IV. First, you don’t have to save the IV anywhere which improves space (by a small factor and based on the number of files encrypted). Second, you don’t have to worry about crafting a correct IV since it is static, as long as your keys are “fresh” and never reused. Third, there is no additional entropy requirements for IV generation. This is important, because it leaves all the available entropy to the generated keys.

I have ran this through the crypto community at StackExchange and they have confirmed that this construct is valid and cryptographically secure.

If an IV of a static counter initialized to 0 is not ideal for your situation, then NIST specifies two ways of creating valid IVs: deterministic and randomly generated.

Deterministic IV

The IV is composed of two fields: fixed field of 32 bits and invocation field of 64 bits.

Use the first 32 bits of the IV as a context identifier. For example device identifier. Each context should be different if the key is reused. If a new key is generated for each encrypted file then the context can be static, because key/IV reuse will be unlikely and will depend on generating a duplicate encryption key.

Use the rest 64 bits as a counter initialized to 0. The counter should be incremented for each encrypted plaintext block. The underlying cryptographic library should handle incrementing the IV counter.

Random IV

Generate a random 96 bit IV from a CSPRNG (cryptographically secure random number generator). The same key can be reused for encryption as long as a new IV is generated for each new encryption operation and the IV is guaranteed to be unique. This mode is not recommended to be used, because its security is harder to prove. When in doubt use the deterministic construction.

Tag considerations

There is no need to use a tag smaller than 128 bits when encrypting files on disk. Using a tag smaller than 128 bits will decrease the amount of additional authenticated data (AAD) and plaintext data that can be processed. The maximum length of the processed AAD for 128 bit tags should be limited to 264 – 1. This limit is large enough where it should not pose any practical limitations. Remember, the hard limit of the invocation of the encryption operation is 2^32 – 2 number of operations and it needs to be satisfied first.

Key considerations

I originally discussed the key considerations for AES-GCM here. I haven’t found a good evaluation of the maximum key use for the AES-GCM mode of operation and I had to do my own analysis to determine the security parameters for the keys. If you think that I have a mistake in my analysis, please let me know and I’ll correct it.

In summary, I have found that using the NIST recommendations of processing 239 – 256 bits of plaintext for a single 256 bit key, 128 bit tag and 96 bit IV, should provide sufficient security margin. No further data chunking should be required. This allows the encryption of files of up to ~64GB in size before the need of regenerating the key (if you’re using a 0 based IV) or the IV (if you want to reuse the key).

Security Warning

No key/IV pairs should ever be reused when using AES-GCM. All used keys should be “fresh” or the IVs should be unique if the same key is used. The IV is required to be a NONCE (number used once) and not necessarily random. This is so important for the GCM construct, that a single repeated NONCE can lead to a complete compromise of the authenticity of the data. For this reason NIST has published special recommendations for using AES-GCM correctly. You should familiarize yourself with them before using AES-GCM in your projects.

If you need to reuse the same key, the maximum number of NONCEs you can generate to achieve 264 security is 228.5. This is based on a comment I found on Stack Exchange.

For some practical disadvantages of AES-GCM please see this Stack Exchange answer.

Resources

http://csrc.nist.gov/groups/ST/toolkit/BCM/documents/proposedmodes/gcm/gcm-spec.pdf

http://web.cs.ucdavis.edu/~rogaway/papers/modes.pdf

http://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38d.pdf

http://crypto.stackexchange.com/a/44115/44337

http://crypto.stackexchange.com/a/44166/44337

https://eprint.iacr.org/2016/475.pdf

http://crypto.stackexchange.com/a/10808

http://csrc.nist.gov/groups/ST/toolkit/BCM/documents/comments/CWC-GCM/Ferguson2.pdf

Exceptions Guidelines

Exception handling is tricky. It requires careful consideration of the project at hand and the way errors can be dealt with. I am presenting a general Exceptions Guidelines best practices that I have come up with after extensive research of the subject. Most of these guidelines can be used to all projects in any language that supports exceptions. Some of the guidelines will be Java specific. At the end you need to have a robust set of guidelines that can help you handle and deal with exceptions and error conditions.

Advantages of exceptions

I won’t go into details about the advantages of exceptions. But it mainly boils down to these three benefits:

  • Separate error handling code from regular code
  • Propagate errors up the call stack
  • Grouping and differentiating error types

Three types of exceptions in Java

Find more details here.

  1. Checked exceptions – these exceptions are checked at compile time and methods are forced to either declare the exception in its throws clause or catch it and handle it.
  2. Unchecked exceptions – these exceptions are checked at runtime so no compilation warnings will be issued if they are not handled. This is the behavior in C++ and PHP as well.
  3. Error – these are unrecoverable exceptions that the JVM can throw such as out of memory exceptions. These should not be caught.

Exceptions architecture and design

Your exception handling should be designed carefully from the start. A bad exception handling design can make your project hard to debug, buggy and inflexible to future changes that will slow down your development lifecycle. Here are some tips on good exceptions design that I have found useful.

Keep the number of custom exceptions to a minimum

As little as necessary to get the job done. Having many unnecessary custom exceptions makes your code bloat in size and harder to maintain. Also your API clients might get overwhelmed and mishandle the exceptions which defeats their purpose. Generic exceptions should cover most of the error use cases.

For unrecoverable errors don’t throw specific exceptions when a generic exception, such as a RuntimeException or a generic unchecked custom exception will do the job. Your clients can’t do anything about this exception. Why force them to worry about a specific exception. Specific custom exceptions should be reserved when different behavior is necessary to handle an error condition.

Limit the use of checked exceptions

Limit the amount of checked exceptions to the minimum and carefully consider each and every use case. There is a lot of controversy surrounding Java’s checked exceptions. There are many people that think checked exceptions are of dubious value. It forces the clients of your API to have a ton of boilerplate code to handle these exceptions. If the exception is not recoverable then throw a generic unchecked exception. If the exception is recoverable then think about either using a specific checked exception (if really needed) or a specific unchecked exception. This way the client can decide on its own if the exception needs specific handling or not. For example an ElementNotFoundException if the database doesn’t contain an ID you are looking for.

Think twice before you define interfaces that have checked exceptions in their method signatures. If later you have some implementations that don’t throw an exception, then you’ll still be forced to throw the interface defined checked exceptions. Your clients would be forced to have boilerplate exception handling code that is completely useless, because they’ll be handling exceptions that are not even thrown. A better approach would be to use unchecked exceptions which will leave the decision of handling the exceptions up to the client.

In general think really hard if you even need checked exceptions in your code. Only explicitly recoverable situations warrant the need of checked exceptions. And even then you can still get by without them. The overhead of handling checked exceptions is significant because of the boilerplate code your clients will need to have. They also make refactoring your code more difficult by making you update all your method definitions that pass through those checked exceptions.

Don’t expose internal, implementation specific details to your clients

Avoid exposing internal implementation specific exceptions to your clients, especially those contained in a third party library. This is a general object oriented rule of thumb and it’s as valid for your exceptions hierarchy design. You have no control over the third party library which can change its exceptions signatures and break all of your API contracts with your clients. Instead wrap those third party exceptions (such as an SQLException) in your own custom exceptions. This way you’ll have much greater flexibility to change the third party library in the future without breaking your clients’ API contract.

Create your own exceptions hierarchy for complex projects

Generally speaking create your own exceptions hierarchy for more complex modules especially if you are dealing with implementation specific exceptions in third party libraries. Each of your packages/modules could have its own top-level generic exceptions. For Java at least one should be defined that inherits from RuntimeException. Wrap all implementation specific exceptions in your custom exceptions so that your clients should only depend on your custom exceptions and/or generic Java exceptions. This will give you greater flexibility to refactor the implementation specific code later without breaking your API contracts.

If more fine grained error handling is necessary then you can further subclass your custom exceptions to handle specific cases and allow for error recovery. For example if you are connecting to an SQL database you can throw a ConnectionTimeoutException in such a way that if needed the client can retry the connection N times before giving up. This way you can later change your database engine to NoSQL and still allow for reconnects and the client code will stay the same.

Document all exceptions

Carefully document all exceptions your package/module/app throws in the javadoc definition of each public method. Failing to do so will frustrate your API users and cause them to not trust your API docs. You don’t really want your clients to dig in your source just to find out that you are throwing a specific exception, right?

Throw exceptions as early as possible.

Check all inputs to your public API methods and throw an exception as soon as you find inconsistencies between your expected parameters and what has been supplied. The earlier you throw an exception the less will be the chance of data corruption, because bad data won’t make it into the deeper parts of your code. It also gives valuable feedback to your clients in a timely manner instead of deep in your code where something throws an obscure exception with a bad message such as ‘Internal Error’ or NullPointerException.

Log exceptions properly

Follow the guidelines of your logging framework in order to properly log exceptions with their message and stack trace. You don’t want to loose either.

Add more context to thrown exceptions

Every time you can add more context to a thrown exception do it! It will be invaluable in the debugging stage. Different contexts can add their own information to a thrown exception by extending the thrown exception’s message or wrapping the exception in a more granular custom exception. Follow the throw path of exceptions through your code and make sure that important information is contained in the exception class or in the exception message so that your clients can properly document or recover from the exception.

Follow the principle of handle-or-propagate:

  • Don’t just catch and re-throw exceptions for no reason. This has significant performance penalties to your code and it is of no use to anybody.
  • Don’t catch, log and re-throw exceptions. This will probably cause you to log the same exception multiple times which should be avoided. It can lead to filling your logs with multiple entries for the same exception. There is nothing worse than a bloated log you have to go through in order to find what went wrong.
  • Don’t ever swallow an exception without proper comment in the code for why you are doing it! Explain why you are not even logging it!
  • Only catch exceptions if you need to extend their error information or handle them. Otherwise let them propagate.
  • Log exceptions once and only once!

Handle all exceptions at the top level of your code

At the top level of your code handle all propagated exceptions correctly. This means to order your catch clauses from the most specific to the most general. You can use multi-catch statements to reduce the boilerplate code you need to write. Make sure that every exception you catch here is logged with the appropriate log level. At the end make sure that your users receive notification for each exception received. Your users should know if something bad has happened and if they can do anything about it.

Some exceptions should cause your program to fail so don’t swallow them! Log them and quit.

Don’t catch top-level exceptions

Don’t catch the top-level exception classes such as Throwable, Exception or RuntimeException directly in your API, unless you really know you need to and you are at the very top of your code base (your main method and top level server/daemon code). These exception classes contain many unrecoverable exceptions that can’t be handled safely. This is especially true for the Throwable class which also catches Error exceptions that the JVM might throw such as out of memory exceptions. Catching these exceptions and continuing to run your application may result in data corruption and undefined behavior.

Make sure that your catch() statements are ordered correctly from most specific to most general. You can use multi-catch statements to group exceptions that need the same treatment like logging an error:

catch( Exception1 | Exception2 | Exception3 e) {
    logger.log( "Bad Exception: " + e.getMessage(), e );
}

Use the common parent if multiple exceptions can be thrown and handled the same way. For example catching IOException will catch automatically FileNotFoundException because it inherits form IOException.

Avoid catching the Throwable class! Bad things will happen when you start catching unrecoverable JVM exceptions.

General rules when working with exceptions

This section will give you general tips on how to deal with exceptions. It should extend and compliment your exceptions design and architecture.

When to consider checked exceptions

Choose checked exceptions only if the client can or should do something to recover from an error. For example if the client specifies a file that doesn’t exist he should be notified to correct this.

Make sure that the checked exception is a specific exception rather than the ‘Exception’ class itself. Otherwise your clients will be forced to catch more exceptions than they intend to handle. For example FileNotFoundException is a specific exception.

The checked exception can be a more general parent exception class such as IOException which is the parent of many IO related exceptions.

For checked exceptions follow the Catch or Specify principle. It means that every method that throws checked exceptions should be wrapped in a try…catch block or the exception needs to be reported in its throws clause.

Wrap lower level checked exceptions in unchecked exceptions

For example if you receive an SQLException in your database handling class you can wrap it in a RuntimeException and throw it. This way you won’t expose unnecessary internal implementation details to higher level clients. But a good rule of thumb is to create your own, package specific, exception that inherits from RuntimeException and use it instead. This way other RuntimeExceptions can still be propagated, but your clients can take specific actions based on this particular exception type.

Preserve encapsulation when converting exceptions from one to another

Always specify the original exception as part of the new exception so the stack trace is not lost. This will help greatly during debugging and root cause analysis.

Choose good error messages

When throwing exceptions, always specify a readable string with the proper description of the error and any additional data that might be helpful to debug the problem further. Such as missing IDs, bad parameters etc. Go through the stack of the thrown exception and figure out if any other place can add additional information to the exception. Adding additional information to the exception at the right context level can greatly enhance the usefulness of your logging.

Document Exceptions

This helps the next guy. It also helps you in a couple of months when you have forgotten about the code you wrote!

Clean up after yourself

Before throwing an exception make sure you clean up any resources that are created in your try block and in your methods. Do this either with try-with-resources or in the finally section of a try…catch…finally block.

Make sure that your objects are in a good state when an exception is thrown. If things need to be deallocated or cleaned up then do it before the throw. If the exception is recoverable make sure that the object from which the exception is thrown is re-initialized correctly to handle a retry.

Don’t ignore exceptions thrown in a thread. If an InterruptedException is thrown in a thread make sure that the thread is properly shutdown and cleaned up in order to avoid data corruption.

Don’t throw exceptions from a finally block

This would cause all exceptions thrown in the try block to be lost and only the finally block exception will be propagated. It is better to handle the finally block exception in the finally block and not let it propagate. This way the exception in the finally block can be logged and dealt with immediately and any other exception that was thrown from the try {} block will be correctly propagated up the call stack.

Don’t use exceptions for flow-control

It slows down your code and makes it less maintainable. It is considered very bad practice.

Don’t suppress or ignore exceptions

It creates hard to find bugs in your code!

Only catch specific exceptions you can handle

Let all other exceptions propagate to a place where they can be handled.

Log exceptions just once

It sucks to debug a program and find the same message appearing multiple times from different places. It will make the log files cleaner and easier to use for debugging.

Don’t create custom exceptions if you can avoid it

For simple projects try not to create custom exceptions if they don’t provide any additional data to the client code. A descriptive name is nice to have but it’s not that helpful. The standard JAVA API has hundreds of exceptions already predefined. Look through those first and see if they fit your business needs.

On the other hand, if state information is needed to be bundled with your exception then use it. For example if a file can’t be opened you can specify the file name, path, permissions, type (symlink, regular file, etc) as separate variables in a CustomException class. Exceptions are regular classes and they can have their own variables and methods.

If you need to create a custom exception then follow the principle of: name the problem not the thrower. Keep in mind the exception should reflect the reason of the problem instead of the operation causing the problem, i.e., FileNotFoundException or DuplicateKeyException instead of CreateException from a create() method. You can already see it is coming from the create() method in the stack trace

When to create custom exceptions

For more involved projects you can create a class hierarchy of exceptions that belong to your package/module/project and use this instead of using the Java provided exceptions. Be careful to not create too many custom exceptions, but the bare minimum to get your errors handled. In most cases a general unchecked custom exception will do just fine so think carefully if you really need anything else.

For example, start with a subclass of the checked Exception class and the unchecked RuntimeException class. If a more specific exception is needed by some of your modules then create it. This should be rare though. A good rule of thumb is to use custom specific exceptions if a specific recovery operation is needed or the client needs to do something different in case of a particular exception. But try to limit the amount of custom exceptions to the bare minimum. Normally having just one of each should do the job in most cases. This way you can allow for substantial refactoring of your code with minimum to no changes to your API client contracts.

Exceptions in Unit Tests

Specific exceptions are great for reducing the amount of false positives during unit testing. If a particular method can throw an IllegalArgumentException because of many different things then how do you know which one caused the exception to be thrown? If you have finer grained control over this you can eliminate the false positives by catching specific exceptions linked to specific issues in that method in your unit tests.

This is tedious but allows for good unit tests. For example a method expects three parameters: name, path, attribute. We can throw three runtime exceptions that inherit from IllegalArgumentException: IllegalNameException, IllegalPathException and IllegalAtrributeException. Those are runtime exceptions which you can specifically check during unit testing. This way you can be certain that your throw statements are being executed as expected in your unit tests.

Properly notify the end user

At the top level of your application throw a generic exception to the end users, notifying them about the error that happened. Make sure that all other error cases have been correctly covered and resources freed.

Final thoughts

Thanks for reading this far! I hope the tips here will be useful to other software engineering projects. Please post your thoughts and ideas in the comments below. It will help evolve the material and keep it current.

Resources

This information has been gathered from different source. The following resources are the ones explicitly mentioned: