Monday, October 25, 2010

Reliability Testing

Reliability tests are designed to confirm whether the software will work in the expected environment for an acceptable amount of time without degradation. It is quite difficult to perform reliability testing effectively and generally it becomes more & more difficult due to the lack of clear requirements.

What is the meaning of Reliability?

First of all let us understand the meaning of reliability; reason being, reliability is generally less understood as compared to other quality attributes like functionality, performance, and security.

Reliability describes the ability of the software product to perform its required functions under stated conditions for a specified period of time or for a specified number of operations. Thus while talking about reliability, we consider following two important factors

1) "Doing what?" (Stated conditions)

2) "For how long?" (Time or operations)

Reliability is measured by a specific failure intensity metric, like the mean time between failures (MTBF). Software that fails on average once a week is considered less reliable than software that fails once a month. We need to differentiate between the severity of those failures and the conditions under which the software was operating (the "doing what?" element of the reliability definition).

How can we increase the Software Reliability?

Software reliability can be improved by programming practices that "catch" error conditions as they occur and handle them in a defined manner.

E.g. generate an error message, do some alternative action, use default values if calculated values are found to be incorrect in some way.

This ability of the software to maintain a specified level of performance and not to break when a failure or an unexpected event takes place is called "Fault Tolerance". We can use the word "Robustness" also for this.

An important aspect of reliability refers to the ability of software to reestablish a specified level of performance and recover any data directly affected by the failure.

The "Recoverability" of software can be considered by following two aspects:

1) Fail-over capability: This refers to the ability to maintain continuous system operations even in the event of failure. In this case, the re-establishing of a specified level of performance may actually take place seamlessly and without getting noticed by the end users.

2) Restore capability: This refers to the ability to minimize the effects of a failure on the system's data. If the recovery is required to take place as a result of some catastrophic event like fire or earthquake etc. we call it "Disaster Recovery".

While considering the recoverability aspects of reliability, we need to provide due consideration to the impact of a failure or disruption:

1) The criticality of system failures

2) The consequences of interruptions in normal operations (whether planned or not)

3) The implications of any data losses resulting from failures

What are the activities of Reliability Test Planning?

Test planning focuses on all the reliability attributes & performing following primary activities:

1) Assessment of risks associated with reliability
2) Definition of an appropriate testing approach to address those risks

3) Setting reliability goals

4) Scheduling the tests