Modelling failure in Ceylon
In all programming languages, we need to deal with operations than can “fail”:
- a pure function might fail to produce a result, or
- an impure function might fail to produce its desired side-effect (create a new file, or whatever).
In neither case can we just blindly continue with the rest of the computation. In the first case, the result of the function might be the input to some other function. In the second case, subsequent operations might assume that the side-effect occurred (that the file now exists, or whatever).
Thus, it’s clear that there must be some way for an operation to signal failure to the calling code. There are two broad mechanisms provided by programming languages for signalling failure:
- Failure may be indicated via return values: an error code,
null
, a union type, or a sum (enumerated) type, for example,Option
/Maybe
orEither
. - Failures may be signalled and handled within some sort of exception or “panic” facility.
Most modern programming languages support both of these mechanisms, though of course the details vary. In particular, languages offer varying degrees of typesafety.
- In languages with proper support for sum types or union types, return values may be used to model failure in a very robust and typesafe way.
- In languages with some sort of effect typing, for example, Java-style checked exceptions, the exceptions are themselves typesafe.
By typesafe what I mean is that an operation that can fail declares the possibility of failure in its signature, and the immediately calling code is forced by the compiler to explicitly handle the failure.
Types of failure
So what facilities should a language offer for modelling failure? Return codes or exceptions? Typesafe or not? To arrive at a partial answer to this question, let’s start with the following classification of “failures”:
- Some failures represent problems that the immediately calling code is very unlikely to be able to recover from. Examples include transaction rollbacks, network failures, low memory conditions, or stack overflows.
- Some failures are typically the result of bugs in the program logic. Examples include assertion failures, division by zero, and use of null pointers. This is a class of failure that, as far as possible, we would like to detect at compile time, but no type system will ever be powerful enough to detect all of these failures. After a few minutes of thought, you should be able to convince yourself this class of problems is actually a subclass of the first class: how can any computation possibly recover meaningfully from a bug in its own logic?
- Finally, there are “failures” that often represent recoverable conditions. For example, one might recover from a nonexistent file by creating a file. Note that failures in this class need not always be recoverable.
Given this classification, I arrive relatively quickly at the following conclusions.
Handling recoverable failures
For “recoverable” conditions, the failure should be typesafe. The compiler should be able to verify that the calling code has made an explicit decision on what to do about the failure: recovering from it, or transforming it into an unrecoverable failure. We want to prevent recoverable failures from going unnoticed by accident.
It’s clear that unchecked exceptions—or other untypesafe solutions such as returning null
in a language like Java where null
is untypesafe—don’t prevent this, and allow failure conditions to go unnoticed, leading to bugs.
The most convenient, elegant, and efficient way to represent a recoverable failure is a union-typed return value. For example, if I have a function for parsing JSON, and it can fail for illegal input, I could use a function with the following signature:
JsonObject|ParseError parseJson() => ... ;
Or, if it seems that ParseError
carries no useful information, I could just use Null
instead:
JsonObject? parseJson() => ... ;
Alternatively, in some advanced cases, one could use a sum (enumerated) return type.
interface ParseResult of ParseSuccessful | ParseError {} class ParseSuccessful(shared JsonObject result) satisfies ParseResult {} class ParseError(shared String message) satisfies ParseResult {} ParseResult parseJson() => ... ;
This is not usually necessary in Ceylon, however.
Handling unrecoverable failures
Now, for “unrecoverable” conditions, the failure should be an untyped (unchecked) exception. For an unrecoverable failure, we shouldn’t be polluting the calling code with concerns it can’t possibly do anything useful with. We want the failure to propagate quickly and transparently to some centralized, generic, infrastructure-level error handling.
Note that, since unchecked exceptions don’t appear in the signature of the operation, the caller doesn’t receive any kind of “fair warning” that they can occur. They represent a sort of designed-in “hole” in the type system.
When in doubt
But wait, you’re probably thinking, haven’t I left a huge question begging here?
What about failure that doesn’t fall cleanly into “recoverable” or “unrecoverable”?
Isn’t there a huge grey area there, filled with failures that are sometimes recoverable by the immediately-calling code?
Indeed there is. And I would say that, as a rule of thumb, treat these failures as recoverable.
Consider our parseJson()
function above. A syntax error in the given JSON text could easily be the result of a bug in our program, but, crucially, it’s not a bug in parseJson()
itself. The code that knows whether it’s a program bug or something else is the calling code, not the parseJson()
function.
And it’s always easy for the calling code to transform a recoverable failure into an unrecoverable failure. For example:
assert (is JsonObject result = parseJson(json)); //result must be a JsonObject here
Or:
value result = parseJson(json); if (is ParseError result) { throw AssertionError(result.message); } //result must be a JsonObject here
That is, when in doubt, we make the calling code explicitly document its assumptions.
Looking at this another way, we err on the side of typesafety, since having too many unchecked exceptions starts to undermine the whole value of the static type system.
Furthermore, it’s quite likely that the calling code is better placed to produce an error with more meaningful information than the code it’s calling (though I have not shown that in the snippets above).
Three things to consider
I promised a “partial” answer to my original question, because there are still a couple of questions that I’m not sure I have a completely bottled answer to, and there’s debate over these issues in the Ceylon community.
Is AssertionError
overused?
First, what kind of failures are legitimate uses of AssertionError
? Should every AssertionError
represent a bug in the program? Is it ever reasonable for a library to throw an AssertionError
when it encounters a situation it considers misuse of its API? Is it acceptable for generic exception handling code to recover from an AssertionError
, or should AssertionError
s be considered fatal?
My answers would be yes, yes, and yes. But perhaps that implies that it was a mistake to follow Java in making AssertionError
an Error
instead of a plain Exception
. (This leads to a larger debate about the role of Error
.)
Is Null
overused?
Second, the class Null
is a seductively convenient way to represent failure of functions which return a value. But are we overusing it? Would it have been better to make the return type of Map<Key,Item>.get()
be Item|NoItem<Key>
instead of the much more generic type Item?
, that is, Item|Null
?
Perhaps. In a sense, returning null
is like throwing Exception
: a little too generic. But since a null return value must be handled by the immediately calling code, which is much better placed to know what this instance of null
represents, it’s not as harmful as a generic Exception
which is handled far from its source.
Whether you agree with that or not, it still might be best to avoid operations where null
can result from multiple different failure conditions. I have broken this rule in the past, and I’m going to be more careful in future.
Functions with no useful return value
Third, for functions with no useful return value, that is, functions which are called only for their side-effect—where the calling code has the option of simply ignoring any return value representing failure—should we err on the side of throwing an exception?
Or, alternatively, should the language offer some way to force the caller to do something with the return value of non-void
function?
Ceylon doesn’t have (and won’t have) checked exceptions, but one could argue that this is the one situation where they would be most useful.
Conclusion
So, ultimately, there are some unanswered questions, and grey areas, but it seems to me that at least we have a rather strong conceptual framework in which to investigate these problems. And it’s clear that the combination of facilities—union types, together with unchecked exceptions—is a powerful foundation for robust failure handling.
Reference: | Modelling failure in Ceylon from our JCG partner Gavin King at the Ceylon Team blog blog. |