Wednesday, 22 October 2008

Resilience - flexibility to cope with system failures

What happened the last time you had a power cut, were you able to work? The answer is probably no, without power your business can not operate. You will probably lose your light and heating as well. Fortunately power cuts tend to be rare and short, the damage is small and most small businesses do not have a contingency for this. You are more likely to suffer a computer system failure that could disrupt one or more of your business processes.

What happens if your PC stops, your internet connection fails, you cannot send or receive email or your server breaks? All of these failures are predictable and can be mitigated in a number of ways so that business can continue. We like to design systems that have levels of resilience built in to take into account some component failures. These measures can be quite simple; can you log on to another computer and carry on working, can you collect your email direct from the internet or use web mail. If your server stops do you still have access to recent shared files? These are examples of resilience; measures put in place to enable people to continue working when elements of the system go wrong.

The level of resilience you need depends on what level of disruption you can tolerate, the likelihood of failure and cost of measures to mitigate the failure. Can you survive without email for a day? What happens if you accounts program stops? What if the payroll needs to go out? Have you considered how resilient your systems are, do things keep going wrong through no fault of your own?
We have found that with a little thought and preparation procedures can often be put in place (at a minimal cost) that will reduce the impact of a fault. People can continue to work, reducing stress and allowing time for a fix to be made.

Once you know what to do, you can practice for system failures (carry out a “fire drill”), but that is a topic for another day.

No comments: