Sunday 4 October 2015

Measuring success in Testing

I'm a strong believer in continual improvement of our practices
Recently we've been focussing on re-invigorating this attitude in our testers.
Most of this has been explaining that...
  • "what we do" now has resulted in a continual evolution over many years.
  • we don't believe in 'best practise' - as it implies you're not prepared to accept that another practice may be better.
  • we are open to trying new ideas and ways to do things.
When talking to test analysts about evolution, and trying new things - I started to think "what are we aiming for?" - how do we know the evolution is positive? how do we know the new idea made a positive difference?
So, I asked a couple of Test Analysts, Business Analysts, and Product Owners: how do we measure the success of our testing?
Below is a digest of some scribbles I kept in my notebook of their answers, and the issues that they (or I) felt existed in each type of success measure. Then I've included my personal view of how I think the success of our testing should be measured.
I'd be keen to hear people's comments.
Scribbles digested
  1. Defect rate
    • Amount of defects found during testing
      • Rationale - if we find lots of defects during the testing activities, we've stopped these making their way into production. Meaning we have a better quality product
    • Number of bugs released to production
      • Rationale - if we don't release any bugs to production, we must be testing all the possible areas
    • No severity 1 defects released
      • Rationale - we make sure the worst bugs don't make it to production
    • Issues
      • How much do those defects you find matter? You can almost always find bugs in any system, but do they matter to the product?
      • "Severity" require a judgement call by someone. If you release no severity 1 defects, and the product fails and no one uses it, you probably weren't assessing severity properly. So was your testing successful?
      • Just because we don't see bugs, doesn't mean they're not there.
        Alternatively, the code might not have been particularly buggy when it was written. So was the success that of the testing or the coding?
  2. Time
    • Time for testing to complete
      • Rationale - the faster something is deployed the better. So if we complete testing quickly, that's success
    • Time since last bug found
      • Rationale - if we haven't found any bugs recently, there must be no more to find
    • Issues
      • Fast testing is not always smartest approach to testing.
      • Defect discovery does not obey a decay curve. Yes, you may have found all the obvious defects, but it doesn't mean you've found all the defects which will affect your products quality.
  3. Coverage
    • Amount of code statements covered by testing activities
      • Rationale - if we execute all the code, we're more likely to find any defects. E.g. Unit tests.
    • Number of acceptance criteria which have been verified
      • Rationale - we know how it should work. So if we test it works, we've built what we wanted.
    • Issues
      • This can lead you to 'pure verification' and not attempting to "push code over" or try unexpected values/scenarios
      • We work on an ever evolving and intertwined code base, only focussing on the new changes ignores regression testing and the fact that new functionality may break existing features.
  4. Amount of testing
    • Amount of testing done
      • Rationale - we've done lots of testing, the product must be good
    • Amount of testing not done
      • Rationale - we made the call not to test this
    • Issues
      • Doing lots of testing can be unnecessary and poor time use.Removing testing requires a judgement call on what should and shouldn't be tested.
        There's always a risk involved when you make those judgements of more or less coverage, but the perhaps the bigger 'social' risk is that you can 
        introduce a bias or blindness to yourself. If you didn't test it last time, and the product didn't fail - are you going to test it next time?Or, it could introduce a business bias - "we did lots of testing last time, and the product failed, we need to do more testing this time."
My view of : how the success of testing should be measured?
To me, we should consider the points above, but focus more on: Did the testing activities help the delivery of a successful product?
If we delivered a successful product, then surely the testing we performed was successful?
So when I answer the question of 'Did my testing activities help the delivery of a successful product', I consider:
But to make that conclusion you have to understand what factors make the product successful.
And, they may not give you the immediate or granular level of feedback you need.
e.g. if success was that the product was delivered on time, and under budget - can you tell how much your testing contributed to that time and budget saving? 
  • Was my testing 'just enough'?
    • Did I cover enough scenarios?
    • Did I deliver it fast enough?
  • Did this testing add value?
    • Have I done enough testing for the success of this change?
    • Can I get the same value by removing some testing activities?
  • Did I find defects in the product?
    • What bugs are critical to the project's success?
    • What bugs matter most to its stakeholders?
  • What does success look like for the projects stakeholders?
    • Zero bugs?
    • Fast Delivery/Turnaround?
    • Long term stability?
I haven't explicitly said that the success of testing should be measured by the quality of a product. To me it's the third bullet point "Did I find defects in the product?" - the measure of the product's quality comes when we consider those defects and the level to which the stakeholders feel they're detrimental to the products success.
I really like Michael Bolton's 2009 article Three Kinds of Measurement and Two Ways to Use Them. It got me thinking about the different ways people approach measurement and made me think about how high level I need to be when giving people a measure of success.
I guess the main thing I've learnt when talking to people, and digesting my thoughts is that you should be thinking about how you're measuring success. I don't think it's formulaic, maybe it's more heuristic, but it's always worth thinking about.