Measuring success in Testing

I'm a strong believer in continual improvement of our practices

Recently we've been focussing on re-invigorating this attitude in our testers.

Most of this has been explaining that...

"what we do" now has resulted in a continual evolution over many years.
we don't believe in 'best practise' - as it implies you're not prepared to accept that another practice may be better.
we are open to trying new ideas and ways to do things.

When talking to test analysts about evolution, and trying new things - I started to think "what are we aiming for?" - how do we know the evolution is positive? how do we know the new idea made a positive difference?

So, I asked a couple of Test Analysts, Business Analysts, and Product Owners: how do we measure the success of our testing?

Below is a digest of some scribbles I kept in my notebook of their answers, and the issues that they (or I) felt existed in each type of success measure. Then I've included my personal view of how I think the success of our testing should be measured.

I'd be keen to hear people's comments.

Scribbles digested

Defect rate

Amount of defects found during testing

Rationale - if we find lots of defects during the testing activities, we've stopped these making their way into production. Meaning we have a better quality product

Number of bugs released to production

Rationale - if we don't release any bugs to production, we must be testing all the possible areas

No severity 1 defects released

Rationale - we make sure the worst bugs don't make it to production

Issues

How much do those defects you find matter? You can almost always find bugs in any system, but do they matter to the product?
"Severity" require a judgement call by someone. If you release no severity 1 defects, and the product fails and no one uses it, you probably weren't assessing severity properly. So was your testing successful?
Just because we don't see bugs, doesn't mean they're not there.
Alternatively, the code might not have been particularly buggy when it was written. So was the success that of the testing or the coding?

Time

Time for testing to complete

Rationale - the faster something is deployed the better. So if we complete testing quickly, that's success

Time since last bug found

Rationale - if we haven't found any bugs recently, there must be no more to find

Issues

Fast testing is not always smartest approach to testing.
Defect discovery does not obey a decay curve. Yes, you may have found all the obvious defects, but it doesn't mean you've found all the defects which will affect your products quality.

Coverage

Amount of code statements covered by testing activities

Rationale - if we execute all the code, we're more likely to find any defects. E.g. Unit tests.

Number of acceptance criteria which have been verified

Rationale - we know how it should work. So if we test it works, we've built what we wanted.

Issues

This can lead you to 'pure verification' and not attempting to "push code over" or try unexpected values/scenarios
We work on an ever evolving and intertwined code base, only focussing on the new changes ignores regression testing and the fact that new functionality may break existing features.

Amount of testing

Amount of testing done

Rationale - we've done lots of testing, the product must be good

Amount of testing not done

Rationale - we made the call not to test this

Issues

Doing lots of testing can be unnecessary and poor time use.Removing testing requires a judgement call on what should and shouldn't be tested.
There's always a risk involved when you make those judgements of more or less coverage, but the perhaps the bigger 'social' risk is that you can introduce a bias or blindness to yourself. If you didn't test it last time, and the product didn't fail - are you going to test it next time?Or, it could introduce a business bias - "we did lots of testing last time, and the product failed, we need to do more testing this time."

My view of : how the success of testing should be measured?

To me, we should consider the points above, but focus more on: Did the testing activities help the delivery of a successful product?

If we delivered a successful product, then surely the testing we performed was successful?

So when I answer the question of 'Did my testing activities help the delivery of a successful product', I consider:

But to make that conclusion you have to understand what factors make the product successful.

And, they may not give you the immediate or granular level of feedback you need.

e.g. if success was that the product was delivered on time, and under budget - can you tell how much your testing contributed to that time and budget saving?

Was my testing 'just enough'?

Did I cover enough scenarios?
Did I deliver it fast enough?

Did this testing add value?

Have I done enough testing for the success of this change?
Can I get the same value by removing some testing activities?

Did I find defects in the product?

What bugs are critical to the project's success?
What bugs matter most to its stakeholders?

What does success look like for the projects stakeholders?

Zero bugs?
Fast Delivery/Turnaround?
Long term stability?

I haven't explicitly said that the success of testing should be measured by the quality of a product. To me it's the third bullet point "Did I find defects in the product?" - the measure of the product's quality comes when we consider those defects and the level to which the stakeholders feel they're detrimental to the products success.

I really like Michael Bolton's 2009 article Three Kinds of Measurement and Two Ways to Use Them. It got me thinking about the different ways people approach measurement and made me think about how high level I need to be when giving people a measure of success.

I guess the main thing I've learnt when talking to people, and digesting my thoughts is that you should be thinking about how you're measuring success. I don't think it's formulaic, maybe it's more heuristic, but it's always worth thinking about.

3 comments:

Unknown6 October 2015 at 05:29
Nice overview. I tend to agree. I have a recommendation however - to complete the picture you should not just consider what "help the delivery", but also things that interfere/
hinder team and software delivery as a result:
- bugs found late (when we could have found them earlier)
Rationale: it is easier for developer to fix code he has just broken compared to one untouched for weeks
- bugs reported but never fixed for whatever reason(duplicate/not a bug/won't fix/low priority...)
Rationale: we waste our time reporting. Developers waste their time reading and analysing bug. Not to mention impact on morale/attitude.
- bugs fixed that does not really matter to any stakeholder (but we and even they don't know about it)
Rationale: we spent time fining a bug that would never bug anyone in production, or at least they would be perfectly fine using workarounds available

Ainars.
Bárbara Cabral20 October 2015 at 04:22
Great article! I searched in a lot of sites and finally I found something that could show me the metrics clearly. Congrats!

Chippie Tester - Learning isn't optional

Sunday 4 October 2015

Measuring success in Testing

3 comments: