Dependable Computing and

Fault Tolerance

The IEEE Technical Committee on Dependable Computing and Fault Tolerance
IFIP Working Group 10.4 on Dependable Computing and Fault Tolerance

Dependable Computing and Fault Tolerance News

Submission Guidelines

  • E-mail submissions with a specific request for inclusion to Chuck Weinstock
  • We now accept formatted entries. Please keep HTML tags to a minimum.
  • We may not be able to post your submission for at least a week, so please plan accordingly.

close

Submitting Items for Publication

To submit an item for publication on the FTTC mailing list simply send it to fttc@dependability.org. Submissions will be moderated but, unless it is rejected, your submission will be sent to the list within a day or so.

2017 Laprie Award Winners

The IFIP 10.4 working group on Dependable Computing created the award in honor of the late Jean-Claude Laprie in 2011. It recognizes outstanding papers that have significantly influenced the theory and/or practice of Dependable Computing. For 2017, the award committee has selected:

Kuang-Hua Huang and Jacob A. Abraham, "Algorithm Based Fault Tolerance for Matrix Operations", in IEEE Transactions on Computers, Vol. C-33, No. 6, pp. 518-528, June 1984

Richard Schlichting and Fred Schneider "Fail-Stop Processors: An Approach to Designing Fault-Tolerant Computing Systems" in ACM Transactions on Computing Systems, 1(3), Aug. 1983, pp. 222-238.

Citations

Kuang-Hua Huang and Jacob A. Abraham‘s Algorithm Based Fault Tolerance for Matrix Operations has formed the basis for an entire new domain of research within the umbrella of dependable computing over the last 30 years. Huang and Abraham’s paper was the first proposing an effective mathematical technique for algorithm level fault tolerance in matrix operations, showing that data could be encoded with very low overhead and provide effective error detection and correction. Since matrices are in the heart of many computation-intensive operations, this technique has been widely used and has inspired significant research work on algorithm based fault tolerance in other computing domains and applications. This outstanding paper has not only stood the test of time, keeping an extensive impact on dependable computing research and practice, but it has also seen its relevance increased year after year, as can be attested by the growing number of its citations.

Schlichting and Schneider’s outstanding paper proposed the fail-stop abstract failure model and defined a formal approach to building fault-tolerant software using this abstraction. The clean failure semantics of the fail-stop model facilitates the building of fault-tolerant software in a distributed system by raising the level of abstraction and greatly simplifying the failure behavior assumption the system designer has to address, moving away from arbitrary failure models and focusing on the simple detection of the cessation of activity by the processor. The paper has significantly influenced subsequent research work that has developed fail-stop model utilizations or proposed model approximations, originating variants of the silent failure behavior such as the fail-fast and fail-safe.