Linux servers have a reputation as workhorses. Since very early in the
development of Linux, its users have boasted in the stability of the
OS. In fact, it is not uncommon to hear of Linux-based servers running
for years without the need for a reboot. This raises the question: how
often should you reboot your Linux server?
Months and months of server uptime can be a good thing (and for some,
even cause for boasting), but is it wise to go such a long time without
rebooting? I would strongly argue that it is not. In fact, a wise
server recovery/contingency plan will include reboots as part of a
regular maintenance schedule. Below I outline some reasons why you
should reboot your server on a regular basis.
### Kernel Upgrades ###
The Linux kernel is under constant development. New drivers are always
being written, old ones are rewritten, bugs are patched, and security
holes are plugged. These upgrades generally result in a system that is
faster, safer, and more reliable. Package managers upgrade the kernel
regularly in most distributions. But even if your distribution doesn’t
automatically upgrade your kernel, for the aforementioned reasons you
should make it a point to do so periodically.
In order for the upgraded kernel to run, the system needs to be
rebooted. Some distros notify the user when a reboot is required, but
it is ultimately the responsibility of the sysadmin to know what
software is being upgraded and what actions those upgrades require.
### Real-World Reliability Testing ###
Any sysadmin who has been at it for a while has experienced this
Something happens that causes the server to shut down–perhaps a
hardware addition/replacement, power loss, or the need to move the
machine. Once the interruption is over, the admin boots the server only
to find that things aren’t working as they should. Some critical
service failed to start properly. What happened? As software packages
are updated and new versions are released, many variables come into
play that affect normal operation of that software. A configuration
setting might become deprecated. A hack that was used to fix a bug in
an old version, may render the new version useless. The list goes on.
As the time between reboots increases, so does the likelihood that some
service will not initialize properly. These errors take time to
diagnose and correct, which translates to unacceptable server downtime.
This problem is compounded when two or three issues occur on a single
reboot. Rebooting on a regular schedule allows the sysadmin to catch
these types of errors quickly. It also provides time to correct the
errors without workflow grinding to a halt, as users are informed ahead
of time that the server will be down for maintenance.
While it is true that services can be restarted individually, nothing
can accurately simulate a full reboot. And the longer you wait between
reboots, the greater the chance of something going wrong. Remember:
*You will never experience a routine reboot until you implement a