Stats

215 Contributors: 3
2016-11-13
Licensed under: CC-BY-SA

Not affiliated with Stack Overflow
Rip Tutorial: info@zzzprojects.com

Download eBook

Getting started with Bosun

Download bosun eBook

Remarks

Bosun is an open-source, MIT licensed, monitoring and alerting system created by Stack Overflow. It has an expressive domain specific language for evaluating alerts and creating detailed notifications. It also lets you test your alerts against historical data for a faster development experience. More details at http://bosun.org/.

Bosun uses a config file to store all the system settings, macros, lookups, notifications, templates, and alert definitions. You specify the config file to use when starting the server, for example /opt/bosun/bosun -c /opt/bosun/config/prod.conf. Changes to the file will not be activated until bosun is restarted, and it is highly recommended that you store the file in version control.

Versions

VersionRelease Date
0.3.02015-06-13
0.4.02015-09-18
0.5.02016-03-15

Docker Quick Start

The quick start guide includes information about using Docker to stand up a Bosun instance.

$ docker run -d -p 4242:4242 -p 80:8070 stackexchange/bosun

This will create a new instance of Bosun which you can access by opening a browser to http://docker-server-ip. The docker image includes HBase/OpenTSDB for storing time series data, the Bosun server, and Scollector for gathering metrics from inside the bosun container. You can then point additional scollector instances at the Bosun server and use Grafana to create dashboards of OpenTSDB or Bosun metrics.

The Stackexchange/Bosun image is designed only for testing. There are no alerts defined in the config file and the data will be deleted when the docker image is removed, but it is very helpful for getting a feel for how bosun works. For details on creating a production instance of Bosun see http://bosun.org/resources

Sample Alert

Bosun alerts are defined in the config file using a custom DSL. They use functions to evaluate time series data and will generate alerts when the warn or crit expressions are non-zero. Alerts use templates to include additional information in the notifications, which are usually an email message and/or HTTP POST request.

template sample.alert {
    body = `<p>Alert: {{.Alert.Name}} triggered on {{.Group.host}}
    <hr>
    <p><strong>Computation</strong>
    <table>
        {{range .Computations}}
            <tr><td><a href="{{$.Expr .Text}}">{{.Text}}</a></td><td>{{.Value}}</td></tr>
        {{end}}
    </table>
    <hr>
    {{ .Graph .Alert.Vars.metric }}`

    subject = {{.Last.Status}}: {{.Alert.Name}} cpu idle at {{.Alert.Vars.q | .E}}% on {{.Group.host}}
}

notification sample.notification {
    email = alerts@example.com
}

alert sample.alert {
    template = sample.template
    $q = avg(q("sum:rate:linux.cpu{host=*,type=idle}", "1m"))
    crit = $q < 40
    notification = sample.notification
}

The alert would send an email with the subject Critical: sample.alert cpu idle at 25% on hostname for any host who's Idle CPU usage has averaged less than 40% over the last 1 minute. This example is a "host scoped" alert, but Bosun also supports cluster, datacenter, or globally scoped alerts (see the fundamentals video series for more details).

Sample Configuration File

Here is an example of a Bosun config file used in a development environment:

tsdbHost = localhost:4242
httpListen = :8070
smtpHost = localhost:25
emailFrom = bosun@example.org
timeAndDate = 202,75,179,136
ledisDir = ../ledis_data
checkFrequency = 5m

notification example.notification {
        email = alerts@example.org
        print = true
}

In this case the config file indicates Bosun should connect to a local OpenTSDB instance on port 4242, listen for requests on port 8070 (on all IP addresses bound to the host), use the localhost SMTP system for email, display additional time zones, use built in Ledis instead of external Redis for system state, and default alerts to a 5 minute interval.

The config also defines an example.notification that can be assigned to alerts, which would usually be included at the end of the config file (see sample alert example).