In the past, I had been tasked to update some Python web apps to have them send e-mails out whenever they run into an uncaught Exception in one of our endpoints. This way we could identify any runtime errors in production and fix them.
In Python it was pretty easy: our web framework, Flask, provides a way to catch
any error that happens in any of your HTTP routes. Even if you didn't use Flask,
you can set sys.excepthook
to a function and catch exceptions globally whenever other code failed to.
But then we had an app written in Go and it needed to have this feature, too. If it were simply a web service, this wouldn't have been bad, but this particular service listened on both an HTTP port and a separate TCP port in another goroutine.
Catching HTTP panics in middleware (or the standard net/http
library, which
catches panics itself) doesn't do anything to help catch the panics thrown by
the TCP server. So, like sys.excepthook
, I needed to find a way to globally
catch panics in my app.
This was surprisingly very hard to do.
tl;dr. code at the bottom
Go doesn't have the concept of Exceptions in the way that Python and other
languages do. There is no try / catch
syntax in Go to safely execute code
that might throw an exception, because Go has no exceptions to catch.
Instead, Go code is expected to return an error
value along with a result
from a function. All of the function's results must be accepted by the
developer into variables, or it will raise a compile-time error and you can't
build your program.
// Divide takes two integers and divides them.
func Divide(a, b float64) (float64, error) {
if b == 0 {
return 0, errors.New("can not divide by zero")
} else {
return a / b, nil
}
}
// your code
func main() {
// You must accept both the result and the error value when you call the
// function.
result, err := Divide(12, 4)
if err != nil {
fmt.Printf("Got an error: %s\n", err)
os.Exit(1)
}
fmt.Printf("Result is: %f\n", result)
// This would've been a compile-time error:
result := Divide(12, 4) // not taking all results from function
}
If you really don't care about the error, and don't want to check whether it's
nil
or anything, you can assign it to the special variable _
, like:
result, _ := Divide(5, 0) // the error is accepted but thrown away with _
But when you type that _
character, you know you are deliberately choosing
to ignore an error response. When you divide by zero, it will return 0 and you
won't know there was an error because you threw it away.
Note: if a function returned an error, it's probably a good idea to treat the result variable as untrusted. In our example the function returns a result of 0 when there's an error. A lot of functions may return their zero-value, or they could return their object in an inconsistent state. Usually you'll do something with the error and then
return
up your stack and not try and use the result.
Compared to languages that have Exceptions, I prefer Go's approach. You must accept and deal with the error, so you will never write code that suddenly panics in production.
You might say "well don't write shitty Python code that panics in production." This is sometimes hard to avoid. For example you could log some data the user sent in their request, and your logger didn't support Unicode and the user sent some, and you get a runtime exception that you didn't program a
try/except
for and your app crashed. Maybe you didn't even consider Unicode, and there are a billion moving parts that cause bugs to squeak through the cracks like this.
Exceptions can be hidden any little place in your code and any place could raise an exception for any reason. Even the documentation may not help you: the writer of a function himself might not even know that some dependency 3 call layers deep might raise an exception on Leap Years or something.
With Go the error values are part of the syntax and the compiler forces you to deal with them, so a whole entire class of problems is eliminated in production applications.
The closest to raising an exception that Go has is the panic()
function, but
this is a nuclear option of last resort that should never be used in production
code. Panics are very hard to handle. Besides users calling panic()
themselves, some runtime errors will cause panics (null pointer exceptions or
out-of-bounds errors).
When a panic()
is called, it will bubble up the call stack of the current
goroutine, return
every function along the way (calling any deferred
functions), until it reaches the top of the goroutine's stack. Then it will
climb the parent goroutine, until it reaches your main()
function and then
it will exit the program with a stack trace.
If nothing recovers the panic along the way, it kills your application.
The only way to handle a panic is to do so in a deferred function. Deferred functions are run at the end of a function call, right before that function returns, and they're often used to clean up after yourself. Close open file handles, etc.
func LoadConfig(filename string) (*Config, error) {
// open the config file
fh, err := os.Open(filename)
if err != nil { // handle those errors!
return nil, err
}
// we may be doing a LOT of stuff in this function, but we want to make
// sure we close the file when we're done, no matter when or how we
// `return`, so defer it.
defer fh.Close()
// so if we want to make sure we handle any panics from this
// function on down the call stack, we defer a recovery.
defer func() {
if err := recover(); err != nil {
// if we're in here, we had a panic and have caught it
fmt.Printf("we safely caught the panic: %s\n", err)
}
}()
// do a bunch of stuff. whenever we `return` the file will be closed.
// if anything raises a panic, we'll recover from it
}
So in a normal application, if you wanted to catch all uncaught panics and
prevent your app from crashing, you could probably install a recover()
deferral somewhere near your main()
function at the top of your call stack.
But this doesn't work with Go web servers and it especially didn't work for my Go web/TCP server!
If you're making a web server in Go, I have good news: the standard library
net/http
has its own panic recovery, so if one of your endpoints raises a
panic it doesn't take down your entire server.
This does present a problem, though, if you want a global panic catcher in
your app. You can't register one in your main()
function that will catch
your HTTP panics, because net/http
will already catch them and prevent
them from bubbling.
You can't register your own HTTP panic catcher and re-raise the panic with
another call to panic()
, because net/http
will always sit between your
main function and your panic-catching middleware. You can either catch your
HTTP panics earlier than net/http
to do your own thing with them, or else
net/http
will always catch them.
So at the very least, my app would need two redundant panic catchers: an HTTP middleware and one outside the HTTP server, in my main function. This wasn't very ideal either.
excepthook
With ChannelsI eventually thought of a solution that would act like Python's sys.excepthook
and would let me centrally handle exceptions while having a minimal
boilerplate code that would need to be pasted around the codebase.
Using channels in Go, I could make a generic panic recovery function that writes the error and stack trace into a channel, and then have one central function reading it, to then send out e-mails or whatever.
Then I could sprinkle around calls to defer errors.Defer()
throughout my
codebase in strategic positions to catch all sorts of panics. My web server
had a panic middleware that forwarded them along this way, and I made clever
use of it throughout the TCP server code -- for example, keeping panics close
to the TCP clients as close to them as possible, so that the TCP server wasn't
taken down whenever an error occurred.
// Package errors handles unexpected panics by emailing them out.
package errors
import (
"fmt"
"io/ioutil"
"log"
"net/http"
"net/url"
"os"
"runtime"
"strconv"
"strings"
"time"
)
// Signal is a channel that lets goroutines all throughout the app (outside of
// HTTP handler functions) catch their own panics and safely forward them to one
// central listener, who can then email them out.
var Signal = make(chan PanicInformation)
// PanicInformation sends a panic message and its stack trace up the Signal
// channel to be safely handled.
type PanicInformation struct {
RecoveredPanic interface{}
Stack string
}
// Bubble sends panic information up the Signal channel. If the traceback is
// empty, this function will collect the stack information.
func Bubble(err interface{}, traceback ...string) {
if len(traceback) == 0 {
stack := make([]byte, 1024*8)
stack = stack[:runtime.Stack(stack, false)]
traceback = []string{string(stack)}
}
Signal <- PanicInformation{
RecoveredPanic: err,
Stack: traceback[0],
}
}
// Defer is a deferred function that recovers from a panic and Bubble's it
// through the Signal channel.
func Defer() {
if err := recover(); err != nil {
log.Printf("errors.Defer(): caught a panic! %s", err)
Bubble(err)
}
}
// EXAMPLE FUNCTION to await panics from your main app
// AwaitPanics watches the errors.Signal channel for any panics caught by the
// sub-packages, to send the details via email.
func AwaitPanics() {
var pi errors.PanicInformation
for {
pi = <-errors.Signal
log.Debug("AwaitPanics: saw a panic! %s", pi.RecoveredPanic)
EmailPanicOrWhatever(pi.RecoveredPanic, pi.Stack)
}
}
There is 1 comment on this page. Add yours.
Hello, thanks for the great post. Unfortunately in Go the sentence "So in a normal application, if you wanted to catch all uncaught panics and prevent your app from crashing, you could probably install a recover() deferral somewhere near your main() function at the top of your call stack."
is not true in all cases. Please check this example and you can read the explanation here.
From my perspective, it's a drawback of Go, because I can't implement robust panic()
handling (even just for logging) for the entire application in which goroutines are used. Of course, I can add deffer
for every goroutine but is very inconvenient and when the third-party library is imported which spawns goroutines that may panic()
, it's impossible to write code that handles it properly.
0.0284s
.