Sunday, 4 October 2015

Measuring success in Testing

I'm a strong believer in continual improvement of our practices
Recently we've been focussing on re-invigorating this attitude in our testers.
Most of this has been explaining that...
  • "what we do" now has resulted in a continual evolution over many years.
  • we don't believe in 'best practise' - as it implies you're not prepared to accept that another practice may be better.
  • we are open to trying new ideas and ways to do things.
When talking to test analysts about evolution, and trying new things - I started to think "what are we aiming for?" - how do we know the evolution is positive? how do we know the new idea made a positive difference?
So, I asked a couple of Test Analysts, Business Analysts, and Product Owners: how do we measure the success of our testing?
Below is a digest of some scribbles I kept in my notebook of their answers, and the issues that they (or I) felt existed in each type of success measure. Then I've included my personal view of how I think the success of our testing should be measured.
I'd be keen to hear people's comments.
Scribbles digested
  1. Defect rate
    • Amount of defects found during testing
      • Rationale - if we find lots of defects during the testing activities, we've stopped these making their way into production. Meaning we have a better quality product
    • Number of bugs released to production
      • Rationale - if we don't release any bugs to production, we must be testing all the possible areas
    • No severity 1 defects released
      • Rationale - we make sure the worst bugs don't make it to production
    • Issues
      • How much do those defects you find matter? You can almost always find bugs in any system, but do they matter to the product?
      • "Severity" require a judgement call by someone. If you release no severity 1 defects, and the product fails and no one uses it, you probably weren't assessing severity properly. So was your testing successful?
      • Just because we don't see bugs, doesn't mean they're not there.
        Alternatively, the code might not have been particularly buggy when it was written. So was the success that of the testing or the coding?
  2. Time
    • Time for testing to complete
      • Rationale - the faster something is deployed the better. So if we complete testing quickly, that's success
    • Time since last bug found
      • Rationale - if we haven't found any bugs recently, there must be no more to find
    • Issues
      • Fast testing is not always smartest approach to testing.
      • Defect discovery does not obey a decay curve. Yes, you may have found all the obvious defects, but it doesn't mean you've found all the defects which will affect your products quality.
  3. Coverage
    • Amount of code statements covered by testing activities
      • Rationale - if we execute all the code, we're more likely to find any defects. E.g. Unit tests.
    • Number of acceptance criteria which have been verified
      • Rationale - we know how it should work. So if we test it works, we've built what we wanted.
    • Issues
      • This can lead you to 'pure verification' and not attempting to "push code over" or try unexpected values/scenarios
      • We work on an ever evolving and intertwined code base, only focussing on the new changes ignores regression testing and the fact that new functionality may break existing features.
  4. Amount of testing
    • Amount of testing done
      • Rationale - we've done lots of testing, the product must be good
    • Amount of testing not done
      • Rationale - we made the call not to test this
    • Issues
      • Doing lots of testing can be unnecessary and poor time use.Removing testing requires a judgement call on what should and shouldn't be tested.
        There's always a risk involved when you make those judgements of more or less coverage, but the perhaps the bigger 'social' risk is that you can 
        introduce a bias or blindness to yourself. If you didn't test it last time, and the product didn't fail - are you going to test it next time?Or, it could introduce a business bias - "we did lots of testing last time, and the product failed, we need to do more testing this time."
My view of : how the success of testing should be measured?
To me, we should consider the points above, but focus more on: Did the testing activities help the delivery of a successful product?
If we delivered a successful product, then surely the testing we performed was successful?
So when I answer the question of 'Did my testing activities help the delivery of a successful product', I consider:
But to make that conclusion you have to understand what factors make the product successful.
And, they may not give you the immediate or granular level of feedback you need.
e.g. if success was that the product was delivered on time, and under budget - can you tell how much your testing contributed to that time and budget saving? 
  • Was my testing 'just enough'?
    • Did I cover enough scenarios?
    • Did I deliver it fast enough?
  • Did this testing add value?
    • Have I done enough testing for the success of this change?
    • Can I get the same value by removing some testing activities?
  • Did I find defects in the product?
    • What bugs are critical to the project's success?
    • What bugs matter most to its stakeholders?
  • What does success look like for the projects stakeholders?
    • Zero bugs?
    • Fast Delivery/Turnaround?
    • Long term stability?
I haven't explicitly said that the success of testing should be measured by the quality of a product. To me it's the third bullet point "Did I find defects in the product?" - the measure of the product's quality comes when we consider those defects and the level to which the stakeholders feel they're detrimental to the products success.
I really like Michael Bolton's 2009 article Three Kinds of Measurement and Two Ways to Use Them. It got me thinking about the different ways people approach measurement and made me think about how high level I need to be when giving people a measure of success.
I guess the main thing I've learnt when talking to people, and digesting my thoughts is that you should be thinking about how you're measuring success. I don't think it's formulaic, maybe it's more heuristic, but it's always worth thinking about.

Monday, 24 August 2015

Position retrospective

In each of our seven seven business units, we have a 'Test Chapter Lead'. This was a new position created just on a year ago as part of a wider restructure, where business units took 'ownership' of their technology teams.
They are the lead representative of testing activities in their business area, often representing the test strategy and practises I outline as test manager. They also manage the HR needs of two to seven testers, which includes training and recruitment.

They share a common job description, and while they don't directly report to me, they have a 'dotted reporting line' to me representing the test strategy and practise relationship.

The Test Chapter Leads and I have a fortnightly catch up, the aim is for us to be aware of what's going on in each others world. Each person takes turns to tell the group what's happened, happening, or about to happen in their area.
As well as info sharing, these catch ups have helped to form some social bonds between us as a group.

Over time, I noticed that the reports at the catch up were starting to vary more and more from person to person.
Here's an example of updates from people we would get in a meeting:
  • there are two upcoming releases people should be aware of, they'll have flow on effects to the core business rules.
  • still working on that long haul project. Nothing to update.
  • we've got a new system that we are trialling for automated testing
  • were looking at working in BDD for the next two sprints, and will be releasing more often
  • we had a great training session last week on heuristics, and we are already seeing it being used by testers
  • a deploy went out last week, and there was a lot of defects the flow on affects are still being felt but I'm running comms and will update you at the next meeting
  • we are still interviewing, and have offers out to three candidates.
It started to become pretty clear that although the Test Chapter Leads shared a job title/description and common 'purpose' - the tasks they were doing on a daily basis were not the same, and were varied from week to week.
I wanted to see if they were ok with the disparity and variance. Was it what they had expected in their role? were they happy? and were they aware of the differences across their group?
Also, I wanted to see if the group still had enough commonality to be supporting each, and if they were still fulfilling the purpose that the role was created for.

So I thought we should try a 'position retrospective'.

What we did:
All the Test Chapter Leads convened at one location for the afternoon
On a white board, I drew a line. At one end I put "I hardly do this", at the other I put "I do this lots"
I asked each of the Test Chapter Leads to
  • take a pile of post it notes
  • think about all the things you do.
  • write each thing down
  • put it on the line

The goal of this was to get them thinking of all the tasks and responsibilities they are taking on as part of their role in their business unit.
It worked really well and there was some solid thinking happening, and I was pleasantly surprised how easy people 'vomited' tasks on to the board.
I let this take as much time as they needed, and once the flurry of activity naturally stopped - we moved on. Not putting a time box on it was important. I wanted people to take time and think and reflect, rather than rush and brain dump which happens when someone says "you've got 5 mins".

I had originally thought about getting each person to do their own line, and then merge them all together. I'm glad I didn't. As they were reading each other's notes and were spurred on to put more and more cards up.
I don't think that this would have happened if I'd asked each person to do their own line.

Some put up lots of HR meta tasks : recruitment, training, and sorting HR issues. Some put things like testing, releasing, and agile ceremonies.
Each person had at least one thing on the line which no one else had. Some had four or more.
Unsurprisingly, the position on the line varied from person to person.
Hardly any tasks with the same description ended up in the same place on the line. Yes, it was a qualitative line - but there was a non insignificant variance between two people who put up the same task.

The next step was to think about value these tasks were adding.
But, I didn't want to use the word 'value'. Surely any task that they were doing was adding value... or at least should be.
What I asked instead was:

  • Looking at the tasks you have put up, do you want to do that task more, or less?
    • Show this by adding a
      '^' to show you want to do it more
      'v' to show you want to do it less
      '-' to show you are are ok doing it as much as you are.
    • Each card has to have an assessment.

The final step was to (as a group) discuss what we saw, looking at patterns, contradictions, or outliers.
For example
  • if someone was doing a task no one else did - were they happy doing it? did they think there was a place for the others to do it as well, or was it specific to their business? in other business units, was a different person picking up this task?
  • if someone was wanting to increase/decrease the time spent on a task, was there any tools, training, or resourcing we could think of as a group to facilitate that change?
  • if there was a task that was being done a lot by many people, was it part of the position? was it transient? And if not, did it need to be communicated as a new responsibility for the position?
  • if two people contradicted each other, one wanting to do something more, and the other wanting to do the same task less, did anyone think that was a bad thing?
We ended up running out of time while doing this, and took the discussion off line.
We transferred the board into a paper form, and each person has had the chance to digest and muse on some differences I highlighted.

What would I change or do differently?
We needed more time. Taking the final step away from the group didn't have the networking & discussions I was hoping for. While doing the activity, the conversations we managed to have face to face and off the cuff were much more constructive. You got immediate clarification, ideas and insight - which meant you had a much quicker feedback loop on suggesting ideas and support to each other.

How did it go?
On the whole I think it worked really well and the feedback afterwards was good.
It was a good chance for everyone to analyse what they were doing, and also see into the shoes of their peers.
People were able to share what the role of Test Chapter Lead was in their world, and see clearly what the role was for their peers.
When you share a job title with other people, you naturally compare yourself to those people. "Why hasn't Ms X picked up that task yet?" or "do I need to be doing that task that Mr Y is doing as well?"
This exercise set up a good discussion on this exact aspect, and cleared up any differences I (and the group) were seeing.
On the surface, you could have assumed that people weren't doing "the job" of a Test Chapter Lead. But, "the job" was actually a moving target based on the context that the person was operating in. The tasks are not static, and the fact that people were dynamic and adaptable is helping their team as a whole succeed.

What's left to do?
I want to bring the group back together for a discussion on the data, and see if we can help each other achieve the "do more" or "do less". Being a distributed group, it's hard to rapidly identify these things - but hopefully it will set up a network where people share the tasks and desired change outside of these sessions.

The visualisation and sharing became the best takeaway from the exercise.
I plan to redo the exercise next time we get together so that we have two data points to reflect on.
Why? I'm hoping that if any tasks haven't changed, and there is still a desired change - we'll be able to think of ways to get that change happening as a group.

Stay tuned...

Tuesday, 28 April 2015

Communication avenues to pairing opportunities

Recently I reread ‘The Power Of Pairing’ by Mike Talks (TestSheepNZ), as part of a ‘blog club’ I facilitated with the Trade Me testers.
It’s about the 5th time I've read this blog post (and it won’t be the last) as pairing is something I’m a huge fan of. I definitely exercised facilitator’s favoritism when picking this blog for blog club.

That said, traditional pairing is not something we ‘actively do’ as Testers at Trade Me. As test managers, I don’t tell testers they should be pairing. I tell them all the benefits and positive outcomes I think pairing would have, but let them decide if they’re going to give it a go.
This said, I think that a lot of the interactions we have as Testers can be looked at as being pairing.
Without putting to a ridged definition on pairing, to me you’re pairing if you’re working alongside someone else in order to create something, exchanging information and learning's. The thing being created could be a project, an artifact, or an approach. 
That exchange of information comes down to communication – and that is something we do actively encourage, facilitate, and push people to do by providing avenues for good communication.
I wanted to share my thoughts on some of these communication avenues, how they lead to pairing opportunities, and how I think it makes our testing (and development) better.

When I started at Trade Me, you sat with your functional group; meaning all of our developers sat together, and all of our testers sat together. This was great for within-discipline communication. Practices, processes, and technical understanding was easily shared and picked up when you joined the team.
The seating arrangement encouraged communication with fellow testers. It was really easy to lean over to a tester at the desk next to and chat to them.

Most of the time you were looking for information, knowledge that tester has which you might not have. Testers would exchange information, affirm or discount assumptions, and bounce ideas around on test coverage and a test approach.
You could look at this as being a type of micro-pairing. You’re developing a test approach together, and drawing two minds onto an activity.

On the flip side, this seating arrangement did hinder the same type of casual communication with developers.
To communicate with the developer you either sent an IM or email asking for time to sit with them, or you walked over to their desk and hoped you were catching them at a good time.
This was before we adopted Agile and Scrum at Trade Me, but even then, ninety nine percent of the time you would be testing solo on a change from a particular developer. These developers are giant oracles which testers should be tapping in to, but often the effort of communication got in the way.

Developers work hard at their jobs and in my experience like to be perfectionists. If you did get a pairing session with the developers, it would often just be a walk through of code which has been written.
As Mike says in his blog “If you're sitting with someone, and one of you is controlling all the conversation for the whole session, then you are not pairing.”
Walk throughs can be great at giving a tester insight into what vulnerabilities might exist, or what type of data to focus on. They can still add value.
But, the developer will (hopefully) be confident in what they've built and won’t be naturally looking for gaps in their own knowledge.

We adopted Agile about two years ago, and with this we collocated our scrum teams (we call them ‘squads’). As you’d expect we saw a massive upswing in casual communication between testers and developers.
Because this communication avenue is now open throughout the development life cycle, there are consistent opportunities for Testers to get involved early.
I've seen this communication spawn pairing sessions between developers and programmers. While the Tester may not be able to pair-program, a session where both parties talk about what data exists, how it gets handled, and the way it could be tested counts as pairing in my book.
Developers have good feedback about the arrangement. They like having more information and perspectives on solutions. After all, their work going through test efficiently makes them look good too!

Initially testers were worried about being siloed and ‘far from’ other testers. The fear was that their easy communication avenue was being closed, and that it would be harder to learn from each other and get peer involvement.
This has happened to some degree, but not to the extremes we were worried about. We've been quite lucky with the seating arrangement. You’re never more than two or three desks away from another tester.

I consciously make sure I’m accepting of interruptions, and I actively encourage our testers to be the same.
No one knows everything at Trade Me and making yourself and others available for communication opportunities is really important if you want to make sure that people have the information and support they need to do a good job.

One of the biggest challenges we've faced is that new testers didn't know we were accepting of interruptions.
When you’re starting in a new role, it’s really hard to know who you can approach and who you can ask for help.
We've had made sure that when we bring a new tester on board that we have a buddy available for them to ask questions. On the first few days that buddy should block out time to be free for interruptions, making them completely interruptable.
Encouraging new starters to interrupt you also means you get a great view on how quickly they’re picking things up, and where their knowledge gaps are.

These interruptions are great leadins for the micro-pairing I talked about above.
Just like Mike says in his blog, pairing has a great benefit to junior testers: "Putting them in the driving seat gives them the opportunity to take the initiative, and to explain their approach.  It allows you the opportunity to suggest additional tests they might want to consider, or give feedback about what tests might be more important to run first."

The Agile meetings and processes we follow are shining communication avenues filled with pairing and pairing opportunities.

Backlog grooming, planning poker and daily stand ups all give people the opportunity to share their perspectives, and give testers and developers the chance to work together to deliver a solution.

In a grooming session you’re pulling upcoming work to pieces, and digging into what’s involved in delivering it.
To me, this is pairing. People are in the right mind-set to learn, and are looking to exchange information. You’re not strictly pair-programming or pair-testing – but you’re working together to shape the coding and testing that will take place.

If you’re at a scrum stand-up, each person is communicating what they've done and plan to do. This communication leads you to great pairing opportunities!
“Hey, that sounds really interesting – I know some stuff about that from last time I tested it, can we pair together when you code the changes?” or “It sounds like you know about this area, can you pair with me when I’m planning the test approach?”

Feed-in, not Feedback.
No one person knows how every part of the Trade Me site works. There are definitely SMEs in the business, but there isn't one person or document which can give you all the information you’ll ever need.
Communication is one way to overcome this knowledge gap, and as a tester the more people you communicate with, the more perspectives you’ll have, and the higher the chance you’ll achieve a quality product.
From Mike' blog“…having an extra perspective allowed the person to question or state things, to bring them to the table for discussion, and thus expand on one persons perspective.”
Getting these perspectives early means they feed-in to your testing activities.

In the last three years we've scaled heavily, increasing the number of testers at Trade Me by 400%. On top of this, some of those are geographically isolated.
Knowing the knowledge gaps, and who to communicate or pair with became harder.
We needed a quicker and more accessible way to feed-in to each others work.

Enter test run reviews. Essentially; every test run a tester creates undergoes peer review.
(Worth noting: We write test runs, which for us are a collection of test conditions, a test approach, and a risk assessment)

This sounds like a normal bureaucratic test review process, but for us it’s far from a mindless box ticking exercise.
These test run reviews are a two way discussion which happens early in the life of the testing activities. The Tester can request a review as soon as they have an approach, mind map, or set of conditions.
The reviewer will question the approach, critique the test condition coverage, and look at the risk assessment. They’ll ask why things have been included or excluded, and make suggestions using their own perspectives.
The review itself is usually an email exchange, but I encourage reviews over the phone – or even better over the shoulder.

This communication and feed-in practice is a type of pairing. It lets experienced testers bring their perspectives to the table, promotes mutual learning, and means you’re working together to produce the testing activities that will take place.

We require that at least one review takes place by another tester, but there is no limit on the number of reviews or who reviews them. We encourage testers to seek out SMEs and get their input on approach and coverage. It can be as simple as asking “we’re changing this area, can you think of anything I might not have tested?”

One thing we’re looking at changing is how we treat new testers with regards to test run reviews and pairing.
At the moment, we don’t let testers do peer reviews until they've been at the company for six months. This isn't that we don’t think they’re good testers, it’s that we don’t want to add to the ‘system shock’ that comes from starting a new job in a new environment.
We’re missing the best type of perspectives – new perspectives.
I’m looking at doing is a “paired pairing”.  If a new starter wants to do a review in their first six months – they can, just make sure you pair with someone else while they do it.

How does it make our testing better?
So, at the top of this blog I said that I’d talk about how these communication avenues, and pairing opportunities make our testing (and development better).
The best example for me is that it removes so much fear of failure, which removes a blame culture.
Knowing that you've used those communication avenues and pairing opportunities, means you have confidence in your test coverage, development approach, and final product as you go into a deploy.
If something gets missed in development or testing, it’s not one person’s fault. You look for the missed communication avenues and pairing opportunities which could have helped, or, you create new ones.

The more avenues you open up, the more that will lead to pairing, and the better informed your perspectives become.

Wednesday, 28 January 2015

Learn about me, and this blog

About me.

I live in Wellington, New Zealand.
This hill bound harbour city is our nation's capital, and home to many awesome cafes and bars. It's also home to many many I.T professionals building awesome things for government departments, banks, private companies, and the public.
I am one of those I.T professionals, specifically - a software tester.
I've been in the software testing profession since 2006, and have been working at Trade Me since 2009. I'm the Test Manager, working with our delivery teams who are continuously making improvements to New Zealand's largest online marketplace and classified advertising platform.

Why I've decided to blog.

There are some great people already blogging, and already contributing to the wider test community. I have no intention of being the best blogger blogging about testing. I have no desire to be an 'evangelist tester' or 'celebrity tester' telling people the right way to do things, or the cool new trend they all should follow.
I decided to blog, because I've learnt A LOT as a tester, and I enjoy sharing what I've learnt - and what I see other people learning.
Actually, the one constant I've encountered in testing (as in life) is that we're always learning. Our days are full of learning experiences.
Our aim as testers is to gather information about systems (learning), and convey that information back to others (facilitating learning). e.g. We learn how something should work, how it actually works, how it doesn't work, and then we report that information to stakeholders.
On top of that, in our professional and personal lives we're always learning how to do new things, better ways to do old things, and the ways in which we'd rather not do things.

I facilitate and participate in learning opportunities constantly in my role; running training sessions as a test manager, mentoring other testers as an experienced test analyst, giving functionality demos and insights as an internal SME, and even just by encouraging people to look for ways to improve things.
I'm constantly learning from testers I work with, testers in the community, developers, business analysts, database engineers, and SMEs.

So the reason for, and the aim of my blog is to share learning experiences; both my own, and those of the people I encounter working as a Tester and Test Manager.

Let's see what we learn.

- Sean

p.s. Chippie Tester comes from a nick name I had as a child. 'Chip'. As in 'Chip off the old block' - a hat tip to my father.
What significance does it have to my blog?... well, both my father and my mother spent their working careers as teachers. They love learning experiences and spent a combined 90 years teaching children through learning experiences.

p.p.s I am a fan of hot chips. I suppose if this doesn't work out, I could make a blog about testing the fried starch amazing-ness that Wellington restaurants have to offer...