Category Archives: Testing and checking

All testing is exploratory: change my mind

I’ve recently returned to Australia after several weeks in Europe, mainly for pleasure with a small amount of work along the way. Catching up on some of the testing-related chatter on my return, I spotted that Rex Black repeated his “Myths of Exploratory Testing” webinar in September. I respect the fact that he shares his free webinar content every month and, even though I often find myself disagreeing with his opinions, hearing what others think about software testing helps me to both question and cement my own thoughts and refine my arguments about what I believe good testing looks like.

Rex started off with his definition of exploratory testing (ET), viz.

A technique that uses knowledge, experience and skills to test software in a non-linear and investigatory fashion

He claimed that this is a “pretty widely shared definition of ET” but I don’t agree. The ISTQB Glossary uses the following definition:

An approach to testing whereby the testers dynamically design and execute tests based on their knowledge, exploration of the test item and the results of previous tests.

The definition I hear most often is something like the following James Bach/Michael Bolton effort (which they used until 2015):

An approach to software testing that emphasizes the personal freedom and responsibility of each tester to continually optimize the value of his work by treating learning, test design and test execution as mutually supportive activities that run in parallel throughout the project

They have since deprecated the term “exploratory testing” in favour of simply “testing” (from 2015), defining testing as:

Evaluating a product by learning about it through exploration and experimentation, including to some degree: questioning, study, modeling, observation, inference, etc.

Rex went on to say that the test basis and test oracles in ET “are primarily skills, knowledge and experience” and any such testing is referred to as “experience-based testing” (per the ISTQB definition, viz. “Testing based on the tester’s experience, knowledge and intuition.”). Experience-based testing that is investigatory is then deemed to be exploratory. I have several issues with this. There is an implication here that ET involves testing without using a range of oracles that might include specifications, user stories, or other more “formal” sources of what the software is meant to do. Rex reinforces this when he goes on to say that ET is a form of validation and “may tell us little or nothing about conformance to specification because the specification may not even be consulted by the tester”. Also, I can’t imagine any valuable testing that doesn’t rely on the tester’s skills, knowledge and experience so it seems to me that all testing would fall under this “experience-based testing” banner.

The first myth Rex discussed was the “origin myth”, that ET was invented in the 1990s in Silicon Valley or at least that was when a “name got hung on it” (e.g. Cem Kaner). He argued instead that it was invented by whoever wrote the first program, that IBM were doing it in the 1960s, that the independent test teams in Fred Brooks’s 1975 book Mythical Man Month were using ET, and “error guessing” as introduced by Glenford Myers in the classic book Art of Software Testing sounds “a whole lot like a form of ET”. The History of Definitions of ET on James Bach’s blog is a good reference in this regard, in my opinion. While I agree that programmers have been performing some kind of investigatory or unscripted testing in their development and debugging activities as long as programming has been a thing, it’s important that we define our testing activities in a way that makes the way we talk about what we do both accurate and credible. I see the argument for suggesting that error guessing is a form of ET, but it’s just one tactic that might be employed by a tester skilled in the much broader approach that is ET.

The next myth Rex discussed was the “completeness myth”, that “playing around” with the software is sufficient to test it. He mentioned that there is little education around testing in degrees in Software Engineering so people don’t understand what testing can and cannot do, which leads to myths like this. I agree that there is a general lack of understanding in our industry of how important structured ET is as part of a testing strategy, I haven’t personally heard this myth being espoused anywhere recently though.

Next up was the “sufficiency myth”, that some teams bring in a “mighty Jedi warrior of ET & this person has helped [them] to find every bug that can matter”. He mentioned a study from Microsoft where they split their testing groups for the same application, with one using ET (and other reactive strategies) only, while the other used pre-designed tests (including automated tests) only. The sets of bugs found by these two teams was partially but not fully overlapping, hence proving that ET alone is not sufficient. I’m confident that even if the groups had been divided up and did the same kind of testing (be it ET or pre-designed), then the sets of bugs from the two teams would also have been partially but not fully overlapping (there is some evidence to support this, albeit from a one-off small case study, from Aaron Hodder & James Bach in their article Test Cases Are Not Testing)! I’m not sure where this myth comes from, I’ve not heard it from anyone in the testing industry and haven’t seen a testing strategy that relies solely on ET. I do find that using ET as an approach can really help in focusing on finding bugs that matter, though, and that seems like a good thing to me.

Rex continued with the “irrelevance myth”, that we don’t have to worry about ET (or, indeed, any validation testing at all) because of the use of ATDD, BDD, or TDD. He argued that all of these approaches are verification rather than validation, so some validation is still relevant (and necessary). I’ve seen this particular myth and, if anything, it seems to be more prevalent over time especially in the CI/CD/DevOps world where automated checks (of various kinds) are viewed as sufficient gates to production deployment. Again, I see this as a lack of understanding of what value ET can add and that’s on us as a testing community to help people understand that value (and explain where ET fits into these newer, faster deployment approaches).

The final myth that Rex brought up was the “ET is not manageable myth”. In dispelling this myth, he mentioned the Rapid Reporter tool, timeboxed sessions, and scoping using charters (where a “charter is a set of one or more test conditions”). This was all quite reasonable, basically referring to session-based test management (SBTM) without using that term. One of his recommendations seemed odd, though: “record planned session time versus actual [session] time” – sessions are strictly timeboxed in an SBTM situation so planned and actual time are always the same. While this seems to be one of the more difficult aspects of SBTM at least initially for testers in my experience, sticking to the timebox is critical if ET is to be truly manageable.

Moving on from the myths, Rex talked about “reactive strategies” in general, suggesting they were suitable in agile environments but that we also need risk-based strategies and automation in addition to ET. He said that the reliance on skills and experience when using ET (in terms of the test basis and test oracle) mean that heuristics are a good way of triggering test ideas and he made the excellent point that all of our “traditional” test techniques still apply when using ET.

Rex’s conclusion was also sound, “I consider (the best practice of) ET to be essential but not sufficient by itself” and I have no issue with that (well, apart from his use of the term “best practice”) – and again don’t see any credible voices in the testing community arguing otherwise.

The last twenty minutes of the webinar was devoted to Q&A from both the online and live audience (the webinar was delivered in person at the STPCon conference). An interesting question from the live audience was “Has ET finally become embedded in the software testing lifecycle?” Rex responded that the “religious warfare… in the late 2000s/early 2010s has abated, some of the more obstreperous voices of that era have kinda taken their show off the road for various reasons and aren’t off stirring the pot as much”. This was presumably in reference to the somewhat heated debate going on in the context-driven testing community in that timeframe, some of which was unhelpful but much of which helped to shape much clearer thinking around ET, SBTM and CDT in general in my opinion. I wouldn’t describe it as “religious warfare”, though.

Rex also mentioned in response to this question that he actually now sees the opposite problem in the DevOps world, with “people running around saying automate everything” and the belief that automated tests by themselves are sufficient to decide when software is worthy of deployment to production. In another reference to Bolton/Bach, he argued that the “checking” and “testing” distinction was counterproductive in pointing out the fallacy of “automate everything”. I found this a little ironic since Rex constantly seeks to make the distinction between validation and verification, which is very close to the distinction that testing and checking seeks to draw (albeit in much more lay terms as far as I’m concerned). I’ve actually found the “checking” and “testing” terminology extremely helpful in making exactly the point that there is “testing” (as commonly understood by those outside of our profession) that cannot be automated, it’s a great conversation starter in this area for me.

One of Rex’s closing comments was again directed to the “schism” of the past with the CDT community, “I’m relieved that we aren’t still stuck in these incredibly tedious religious wars we had for that ten year period of time”.

There was a lot of good content in Rex’s webinar and nothing too controversial. His way of talking about ET (even the definition he chooses to use) is different to what I’m more familiar with from the CDT community but it’s good to hear him referring to ET as an essential part of a testing strategy. I’ve certainly seen an increased willingness to use ET as the mainstay of so-called “manual” testing efforts and putting structure around it using SBTM adds a lot of credibility. For the most part in my teams across Quest, we now consider test efforts to be considered ET only if they are performed within the framework of SBTM so that we have that accountability and structure in place for the various stakeholders to treat this approach as credible and worthy of their investment.

So, finally getting to the reason for the title of this post, both by Rex’s (I would argue unusual) definition (and even the ISTQB’s definition) or by what I would argue is the more widely accepted definition (Bach/Bolton above), it seems to me that all testing is exploratory. I’m open to your arguments to change my mind!

(For reference, Rex publishes all his webinars on the RBCS website at http://rbcs-us.com/resources/webinars/ The one I refer to in this blog post has not appeared there as yet, but the audio is available via https://rbcs-us.com/resources/podcast/)

How to win the war? Follow the script

In mid-2002, the US armed forces ran one of the largest and most expensive war game experiments in history, known as the “Millennium Challenge 2002”. It was designed to be a test of new technologies to enable network-centric warfare to give better command and control over both current and future weaponry and tactics.

The scenario was that a crazed but cunning (and strongly anti-American) military commander had broken away from his government somewhere in the Persian Gulf. Religious and ethnic loyalty gave him power and strong links to terrorist organizations made him even more dangerous. War was imminent.

The US side, known as the “Blue” team (as they always are in such military exercises apparently), were pitted against the “Red” team – with the rogue commander being played by retired Marine Corps Lieutenant General, Paul Van Riper.

It’s worth a quick note on the character of  Van Riper at this point. His forty year military career included Vietnam and reading about him (especially from the words of those he led) it is clear that he was a straight-talking leader who inspired his teams to work for him even in the most dangerous and difficult of circumstances. By the time of this war game, he was retired and in his mid-60s – with no real need to be circumspect.

What actually happened during the running of the war game is described well in [1]:

In the first few days of the exercise, using surprise and unorthodox tactics, the wily 64-year-old Vietnam veteran sank most of the US expeditionary fleet in the Persian Gulf, bringing the US assault to a halt.

What happened next will be familiar to anyone who ever played soldiers in the playground. Faced with an abrupt and embarrassing end to the most expensive and sophisticated military exercise in US history, the Pentagon top brass simply pretended the whole thing had not happened. They ordered their dead troops back to life and “refloated” the sunken fleet. Then they instructed the enemy forces to look the other way as their marines performed amphibious landings. Eventually, Van Riper got so fed up with all this cheating that he refused to play any more. Instead, he sat on the sidelines making abrasive remarks until the three-week war game – grandiosely entitled Millennium Challenge – staggered to a star-spangled conclusion on August 15, with a US “victory”.

Van Riper very publicly aired his opinions on how ridiculously the game had been played and strongly criticized the idea that the ultimate “Blue” victory validated anything about the technology and approach the game was designed to test. In [2], he says:

There were accusations that Millennium Challenge was rigged. I can tell you it was not. It started out as a free-play exercise, in which both Red and Blue had the opportunity to win the game. However, about the third or fourth day, when the concepts that the command was testing failed to live up to their expectations, the command then began to script the exercise in order to prove these concepts.

This was my critical complaint. You might say, “Well, why didn’t these concepts live up to the expectations?” I think they were fundamentally flawed in that they leaned heavily on technology. They leaned heavily on systems analysis of decision-making.

It would seem that the skills and experience of Paul Van Riper and his ability to react quickly to what he observed gave him a significant advantage over the scripted, process-driven approach of his enemy. Yet, rather than making any effort to incorporate his alternative strategies, it was deemed better to constrain his actions to allow the script to play out the way it was “meant to”.

The analogy with scripted vs exploratory tests is very strong I think, so perhaps next time you’re locked in battle with a factory schooled commander of scripted testing, take up the battle and demonstrate your superior powers of testing. Even if your testing war game ends up the same way as the Millenium Challenge, at least you might have won the battle – and won some supporters for your exploratory testing cause along the way.

For reference…

[1] “Wake-up Call” (The Guardian, UK): http://www.theguardian.com/world/2002/sep/06/usa.iraq

[2] “The Immutable Danger of War” (Scott Willis interview with Van Riper) http://www.pbs.org/wgbh/nova/military/immutable-nature-war.html

(You can read more about the Millenium Challenge (2002) on wikipedia.)

Inspiration for this post came from reading about this war game in the fascinating book Blink by Malcolm Gladwell (and the same book provided inspiration for my previous post, Maier’s Two Cord Puzzle and Testing Heuristics).

Getting our message across about what “testing” really is

A recent Tweet about the BugBuster product again made me realise what a long journey we have as a community to educate the wider populous about what “testing” actually is (and is not).

The BugBuster website, for example, says this on its “Features” page:

Who said testing meant writing and endlessly maintaining test cases? BugBuster runs smart software agents that explore and test your website automatically. That’s right, no need to write test cases! The agents … test the various elements of the web app as if it was done by a human being.

The emphasis on the tool doing the same thing as humans is such a common perception of what testing can be reduced to, the “checking”* mentality is everywhere. I have no issues with using tools to help with testing, with automation to perform mundane checking, to help speed up development (not testing). But I do take issue with the idea that testing is dehumanizable.

They raise a good question here: “who said testing meant writing and endlessly maintaining test cases?” I spent too long thinking this was my job too and it’s almost unbelievable to look back at that time and think that I was adding any value to anything. The realization that testing really isn’t this but is in fact intellectually challenging and can add incredible value to the process of delivering great software for our users took me too long to reach, but at least I got there (thanks to Michael Bolton and the life changing experience that was his Rapid Software Testing course back in 2007).

How do we help others in this industry come to the same realization when they are bombarded with messages that dehumanize what “testing” really is? The context-driven testing community is full of great thinkers and their ideas about how to do great testing, but how do we in that community get our message across to the masses? While we do already have organizations like AST and ISST flying the CDT flag, what else can we do to broaden the wider community’s knowledge of what “testing” really is?

* Want to know more about the “Testing vs. checking” distinction? Start here with this Michael Bolton blog post.