|
Full title
Empirical research on spontaneous diversity in programs
keywords
Diversity, programming language, dependability, fault tolerance,
run-time checks, software, programming contest
summary
The University of Valladolid has a website with more than
1,500 specifications for programs. Everyone can write programs to these
specifications and send them to Valladolid over the internet. The website
automatically assesses every submission and gives feedback to the author
whether it is right or wrong. In total, over fifty thousand authors have
submitted more than three million programs!
With this large resource of diverse programs, it is possible
to perform empirical studies of the impact of a range of diversity techniques
on program reliability [1, 2, 3, 4]. Some of the results of these empirical
studies are shown below.
What does fault tolerance in 1-out-of-2 systems give us?
1-out-of-2 systems have been presented as a means to significantly
improve dependability. It is however hard to determine how much improvement
is likely because it is in general difficult to get a large number programs
written to the same specification. However, given the many thousands of
programs for each on-line contest specification, we can easily measure
the probability of undetected failure on demand averaged over all possible
pairs in the program population.
An example of our measurements is the following:

The horizontal axis gives the average probability
of failure on demand of a single program in the population. The average
pfd is reduced by removing the most unreliable programs from the population.
The vertical axis gives the reliability improvement a 1-out-of-2 pair
gives over a single program drawn from that population. We can observe
that for unreliable programs, the probability of failure of the pair is
approximately the product of the probabilities of the individual programs
(close to the independence assumption). We also observe that the reliability
improvement reaches a “plateau” at about a factor of a hundred.
This plateau has often been predicted, but has never been observed before!
Diverse programming languages
Forced diversity has been advocated as a means of increasing
program diversity (different development methods, languages, etc). In
this study we examined the impact of diverse languages (C, C++ and Pascal)
on the reliability improvement of a 1-out-of-2 pair by choosing different
languages for the two programs in the pairs.
An example of these measurements is shown below.

On the horizontal axis, we again have the
average probability of failure on demand of the program population, on
the vertical axis the reliability improvement of the pair of program over
a single version. It can be seen that the reliability improvement is always
better if the second program of the pair uses a different language from
the first (which is written in C). The three curves correspond to a C/C,
a C/C++ and a C/Pascal pair. We can see that the C/Pascal pair is significantly
better. Experiments on other contest problems confirm this, though the
improvement is often less than the example shown above.
Diverse run-time checks
A last example of our experiments concerns run-time checks.
Run-time checks are a diverse means of checking the output of a program
to detect program failure. Run-time checks evaluate one or more properties
of the program output and flag the output as incorrect if it fails the
check.
We have been able to measure the effectiveness of run-time
checks for various specifications. An example is:

The horizontal axis gives the average probability of failure
on demand of the program population, and the vertical axis gives the improvement
in undetected failures using various run-time checks. The box in the graphs
describes the various run-time checks in our analysis; we will not go
into detail about that here. We can observe a maximum improvement of about
a factor of three, but only in a limited range. The average improvement
is much lower than that. This indicates that run-time checks can give
some improvement, but the reliability improvement does not appear to be
large for the most reliable programs in the population.
links
papers
Bentley, J.G.W., Bishop, P.G., Meulen, M.J.P. van der, An
Empirical Exploration of the Difficulty Function, Safecomp, Potsdam, 21-24
September 2004, Potsdam, Germany, pp. 60-71, Springer Verlag, 2004.
Meulen, M.J.P. van der, Bishop, P.G., Revilla, M., An Exploration
of Software Faults and Failure Behaviour in a Large Population of Programs,
ISSRE 2004, Rennes, France, pp. 101-12, 2004.
Meulen, M.J.P. van der, Revilla, M., The effectiveness of
choice of programming language as a diversity seeking decision, EDCC 2005,
Budapest, pp.199-209.
Meulen, M.J.P. van der, Strigini, L., Revilla, M., On the
Effectiveness of Run-Time Checks, Safecomp 2005, Halden, Norway, to be
published.
author
Meine van der Meulen and Peter Bishop (City)
|