Monitoring server resources using Munin
We will monitor and display the server’s health status using munin. First, we’ll focus on its general setup and subsequently on various useful plugins.
Munin-Master setup
The master will gather data from all nodes and render them using HTML.
Install munin
:
$ pacman -S munin
Then, instruct it to store HTML renders of the health-reports in /srv/http/munin
. This will make them easily accessible using a webserver later on.
First, prepare the directory:
$ mkdir -p /srv/http/munin
$ chown munin:munin /srv/http/munin
And then edit /etc/munin/munin.conf
:
htmldir /srv/http/munin
In order to generate graphs every 5 minutes, we will create a systemd-service which is going to be called from a systemd-timer.
The service (/etc/systemd/system/munin-cron.service
) itself will call munin-cron
and looks as follows:
[Unit]
Description=Survey monitored computers
After=network.target
[Service]
User=munin
ExecStart=/usr/bin/munin-cron
The timer /etc/systemd/system/munin-cron.timer
is then:
[Unit]
Description=Survey monitored computers every five minutes
[Timer]
OnCalendar=*-*-* *:00/5:00
[Install]
WantedBy=multi-user.target
Before enabling them, we can try a manual test-run by running munin-cron
as the munin user (remember that nothing will happen without enabling some plugins):
$ su - munin --shell=/bin/bash -c munin-cron
If we are sure that everything works, we can finally enable the timer:
$ systemctl daemon-reload
$ systemctl enable --now munin-cron.timer
And afterwards – of course – check the logs:
$ journalctl --unit munin-cron.service
Making the results available
To make the results accessible using a web-browser, we use lighttpd
.
This will automatically serve /srv/http/
on port 80 (make sure to checkout /etc/lighttpd/lighttpd.conf
):
$ pacman -S lighttpd
$ systemctl start lighttpd
The reports can then be accessed under <server ip>/munin/
.
In order to have interactive graphs, we will have to enable (fast) CGI
:
$ pacman -S fcgi
$ touch /var/log/munin/munin-cgi-graph.log
$ chmod 777 /var/log/munin/munin-cgi-graph.log
$ chmod 777 /var/lib/munin/cgi-tmp
Test if the CGI-executable is able to run:
$ /usr/bin/perl -T /usr/share/munin/cgi/munin-cgi-graph
Add the following to /etc/lighttpd/lighttpd.conf
:
server.modules += ("mod_fastcgi")
fastcgi.server += ("/munin-cgi/munin-cgi-graph" =>
((
"socket" => "/var/run/lighttpd/munin-cgi-graph.sock",
"bin-path" => "/usr/share/munin/cgi/munin-cgi-graph",
"check-local" => "disable"
))
)
Munin-Node setup
Each device for which health-summaries shall be reported needs to become a munin-node
(this is also the case for the master).
Luckily, this is rather trivial:
$ pacman -S munin-node
On the node itself, we need to allow communication with the master (in /etc/munin/munin-node.conf
):
host_name <my name>
allow ^<master ip>$
and start the node (don’t forget to add plugins though):
$ systemctl enable --now munin-node
On the master-server, we have to add a configuration entry per node to /etc/munin/munin.conf
:
[group_name;master-node]
address 127.0.0.1
[group_name;machine01]
address <node ip>
Plugins
Without plugins, munin
won’t be reporting much.
In the following, a few useful ones will be listed. More plugins can be found, e.g. by calling munin-node-configure --suggest
.
Each plugin can be installed by first copying them to /usr/lib/munin/plugins/
and then linking with /etc/munin/plugins/
.
Note that they must be executable (chmod a+x /usr/lib/munin/plugins/<plugin name>
):
$ ln -s /usr/lib/munin/plugins/<plugin name> /etc/munin/plugins/
As a general rule, each individual plugin can be tested in isolation using the following command:
$ munin-run <command name>
For the CPU-plugin <command name>
would be cpu
.
Remember, that a node needs to be restarted after changing its plugin configuration:
$ systemctl restart munin-node
Common plugins
The following are plugins providing generally useful statistics:
cpu
: CPU-speeddf
: disk space usageif_<interface>
: tx/rx rates on given interfaceprocesses
: overview of process numbersmemory
: RAM usage
My own custom plugins can be found here.
SMART-plugin
S.M.A.R.T.
provides a nice way of monitoring your disks (HDD, SSD, etc) health status.
Its basic usage if fairly straight-forward:
$ pacman -S smartmontools
$ smartctl -i /dev/sda # show device info
$ smartctl -t short /dev/sda # run a short test
$ smartctl -H /dev/sda # show test results
To interlink it with munin
, first configure the plugin by writing the following to /etc/munin/plugin-conf.d/munin-node
:
[smart_*]
user root
group disk
and secondly enable it:
$ ln -s /usr/lib/munin/plugins/smart_ /etc/munin/plugins/smart_sda # for disk /dev/sda
lm_sensors-plugin
lm_sensors
allow the tracking of temperatures, voltages and more.
First, set them up as you normally would:
$ pacman -S lm_sensors
$ sensors-detect # generate kernel-modules (always press enter)
$ sensors
Temperatures can then be monitored by adding the respective plugin:
$ ln -s /usr/lib/munin/plugins/sensors_ /etc/munin/plugins/sensors_temp
Troubleshooting
General tips
A manual connection to a node is possible, and useful for debugging:
$ netcat <node ip> 4949
One can then enter e.g. one of the following commands:
list
: list enabled pluginsfetch <plugin name>
: check output of given plugin
Furthermore, munin-cron
can be run with the --debug
option to show what is going on in more detail.
More information can be found here.
Corrupted database
The munin-databases can be found in /var/lib/munin/<group name>
. Delete them to reset all data.
Certain nodes cannot be reached
Check that their ip-address is set correctly in the master’s /etc/munin/munin.conf
.
Furthermore make sure that their own configuration (/etc/munin/munin-node.conf
) allows the master to connect (allow <master ip>
).