It is hard to achieve fault tolerance computer system without hardware redundancy, but there are some methods which can improve system reliability only with software redundancy, one of them is “recovery blocks”.
Elements of Recovery Blocks
Recovery blocks is a system of cooperation of a few software elements which realize the same task. Each software has additionally “Acceptance Test” code. The “Acceptance Test” evaluates tasks results and return information if the software execution fails or not. A software version and its “Acceptance Test” creates a “Recovery block”. “Recovery block” takes input value, realize task and returns output value and information if software fails or not.
It is better to use “The Either Pattern” – return only Output value or Error:
A sorted set of recovery blocks has connected its outputs to “Selection Logic” which drives software executions and selects the whole system output.
Selection logic takes first block’s output, and if it is not an error, then pass it as a whole system output. When the block returns an error, then selection logic will ask second block in the set for computation output, if the output is correct then will be passed, otherwise next block will be asked for result. If all blocks return error, then whole system will return error.
Input values latch
There is one element left -input values latch. Each block has to make computation in one cycle with the same set of input values. It means that the input values must be hold somewhere in the systems, and stay unchanged until Selection Logic pass output (value or error).
Summarize Recovery Blocks
“Recovery blocks” is an example of fault tolerance pattern, which utilize time redundancy – additional time is utilized for redundant computations. At first glance it seems that also software redundancy exists there, but not exactly. Because each software blocks may be the same version of software but with different configuration.
The pattern itself represent base idea of solution, and can be extended and modified a little to utilize parallel outputs computing , N-Version programming or voting. I will explain these modification in one of my next post.
- 2017) Fault-Tolerance Techniques for Spacecraft Control Computers. Wiley, . (
- 2007) Patterns for Fault Tolerant Software. Wiley, . (