IN THIS ARTICLE:
Why perform health checks?
Whereas automated monitoring provides an excellent foundation layer for monitoring your servers, performing regular manual health checks can provide absolute assurance that your environment is operating normally. For example, whereas an automated monitoring solution may establish that a required Aspera service is running on a server, it may not reveal that although the service is running no transfers are possible for a particular user because the file system permissions for that user have been incorrectly configured.
Health checks should be run in conjunction with automated monitoring of your systems. Best practice suggests that health checks should be performed at least once a day. This will of course depend on the business criticality of your system. A mixture of automated monitoring and manually operated health checks is the recommended methodology to absolutely ensure operational stability.
It is recommended that a simple health check procedure which can be followed and executed by your staff is devised. This does not have to be more than a page or two, detailing the most important checks required for your system to operate normally. Check cycles can be separated into timed slots, each slot containing different sets of checks if required. For example, a main check could be carried out every morning at 8am and further less detailed checks can be done every 8 hours until 8am the next day.
When devising these checks, think about the operations required to support the most important file transfer flows within your business. Which file transfer operations within your deployment are absolutely critical?
Suggested health check template
Draw up a daily health check checklist, which should detail the check function, the steps involved, the expected result, and whether the check passed or failed:
Devising a good health check checklist
- A good health check procedure should take no more than 15-20 minutes to complete.
- Ensure all information to perform the checks are available in a location accessible by everyone performing the checks.
- List out each step required to perform the test as clearly as possible, using bullet points if you like
- Split the checks into logical sections containing different tests, for example "Faspex delivery" and "Receive files from partners".
- Consider making a full check performed daily, performing all the checks and then perform a sub-set of these checks at regular intervals throughout the day -- for example once at lunchtime and another at 5PM.
Below is an example of two different tests, using the checklist template above: