Category Archives: Testing and checking

ER: “Ask Me Anything” session on Exploratory Testing

I took part in my first “Ask Me Anything” session on 22nd March, answering questions on the topic of “Exploratory Testing” as part of the AMA series organized by The Test Tribe.

Presenting an AMA was a different experience in terms of preparation compared to a more traditional slide-driven talk. I didn’t need to prepare very much, although I made sure to refamiliarize myself with the ET definitions I make use of and some of the most helpful resources so they’d all be front of mind if and when I needed them to answer questions arising during the AMA.

The live event was run using Airmeet.com and I successfully connected about ten minutes before the start of the AMA. The system was easy to use and it was good to spend a few minutes chatting with my host, Sandeep Garg, to go over the nuts and bolts of how the session would be facilitated.

We kicked off a few minutes after the scheduled start time and Sandeep opened with a couple of questions while the attendees started to submit their questions into Airmeet.

The audience provided lots of great questions and we managed to get through them all, in just over an hour. I appreciated the wide-ranging questions which demonstrated a spectrum of existing understanding about exploratory testing. There is so much poor quality content on this topic that it’s unsurprising many testers are confused. I hope my small contribution via this AMA helped to dispel some myths around exploratory testing and inspired some testers to take it more seriously and start to see the benefits of more exploratory approaches in their day-to-day testing work.

Thanks to The Test Tribe for organizing and promoting this AMA, giving me my first opportunity of presenting in this format. Thanks also to the participants for their many questions, I hope I provided useful responses based on my experience of adopting an exploratory approach to testing over the last 15 years or so!

The full “Ask Me Anything” session can be viewed on The Test Tribe’s YouTube channel:

The power of the pause

2 Replies

While writing my last blog post, a review of Cal Newport’s “Deep Work” book, I reminded myself of a topic I’ve been meaning to blog about for a while, viz. the power of the pause.

Coming at this from a software development perspective, I mentioned in the last blog post that:

“There seems to be a new trend forming around “deployments to production” as being a useful measure of productivity, when really it’s more an indicator of busyness and often comes as a result of a lack of appetite for any type of pause along the pipeline for humans to meaningfully (and deeply!) interact with the software before it’s deployed.”

I often see this goal of deploying every change directly (and automatically) to production without the goal being accompanied by compelling reasons for doing so – apart from maybe “it’s what <insert big name tech company here> does”, even though you’re likely nothing like those companies in most other important ways. What’s the rush? While there are some cases where a very quick deployment to production is of course important, the idea that every change needs to be deployed in the same way is questionable for most organizations I’ve worked with.

Automated deployment pipelines can be great mechanisms for de-risking the process of getting updated software into production, removing opportunities for human error and making such deployments less of a drama when they’re required. But, just because you have this mechanism at your disposal, it doesn’t mean you need to use it for each and every change made to the software.

I’ve seen a lot of power in pausing along the deployment pipeline to give humans the opportunity to interact with the software before customers are exposed to the changes. I don’t believe we can automate our way out of the need for human interaction for software designed for use by humans, but I’m also coming to appreciate that this is increasingly seen as a contrarian position (and one I’m happy to hold). I’d ask you to consider whether there is a genuine need for automated deployment of every change to production in your organization and whether you’re removing the opportunity to find important problems by removing humans from the process.

Taking a completely different perspective, I’ve been practicing mindfulness meditation for a while now and haven’t missed a daily practice since finishing up full-time employment back in August 2020. One of the most valuable things I’ve learned from this practice is the idea of putting space between stimulus and response – being deliberate in taking pause.

Exploring the work of Gerry Hussey has been very helpful in this regard and he says:

The things and situations that we encounter in our outer world are the stimulus, and the way in which we interpret and respond mentally and emotionally to that stimulus is our response.
Consciousness enables us to create a gap between stimulus and response, and when we expand that gap, we are no longer operating as conditioned reflexes. By creating a gap between stimulus and response, we create an opportunity to choose our response. It is in this gap between stimulus and response that our ability to grow and develop exists. The more we expand this gap, the less we are conditioned by reflexes and the more we grow our ability to be defined not by happens to us but how we choose to respond.
Awaken Your Power Within: Let Go of Fear. Discover Your Infinite Potential. Become Your True Self (Gerry Hussey)

I’ve found this idea really helpful in both my professional and personal lives. It’s helped with listening, to focus on understanding rather than an eagerness to simply respond. The power of the pause in this sense has been especially helpful in my consulting work as it has a great side effect of lowering the chances of jumping into solution mode before fully understanding the problem at hand. Accepting the fact that things will happen outside my control in my day to day life but that I have the choice in how to respond to whatever happens has been transformational.

Inevitably, there are still times where my response to stimuli is quick, conditioned and primitive (with system 1 thinking doing its job) – and sometimes not kind. But I now at least recognize when this has happened and bring myself back to what I’ve learned from regular practice so as to continue improving.

So, whether it’s thinking specifically about software delivery pipelines or my interactions with the world around me, I’m seeing great power in the pause – and maybe you can too.

Is talking about “scaling” human testing missing the point?

3 Replies

I recently came across an article from Adam Piskorek about the way Google tests its software.

While I was already familiar with the book How Google Tests Software (by James Whittaker, Jason Arbon et al, 2012), Adam’s article introduced another newer book about how Google approaches software engineering more generally, Software Engineering at Google: Lessons Learned from Programming Over Time (by Titus Winters, Tom Manshreck & Hyrum Wright, 2020).

The following quote in Adam’s article is lifted from this newer book and made me want to dive deeper into the book’s broader content around testing*:

Attempting to assess product quality by asking humans to manually interact with every feature just doesn’t scale. When it comes to testing, there is one clean answer: automation.
Chapter 11 (Testing Overview), p210 (Adam Bender)

I was stunned by this quote from the book. It felt like they were saying that development simply goes too quickly for adequate testing to be performed and also that automation is seen as the silver bullet to moving as fast as they desire while maintaining quality, without those pesky slow humans interacting with the software they’re pushing out.

But, in the interests of fairness, I decided to study the four main chapters of the book devoted to testing to more fully understand how they arrived at the conclusion in this quote – Chapter 11 which offers an overview of the testing approach at Google, chapter 12 devoted to unit testing, chapter 13 on test doubles and chapter 14 on “Larger Testing”. The book is, perhaps unsurprisingly, available to read freely on Google Books.

I didn’t find anything too controversial in chapter 12, rather mostly sensible advice around unit testing. The following quote from this chapter is worth noting, though, as it highlights that “testing” generally means automated checks in their world view:

After preventing bugs, the most important purpose of a test is to improve engineers’ productivity. Compared to broader-scoped tests, unit tests have many properties that make them an excellent way to optimize productivity.

Chapter 13 on test doubles was similarly straightforward, covering the challenges of mocking and giving decent advice around when to opt for faking, stubbing and interaction testing as approaches in this area. Chapter 14 dealt with the challenges of authoring tests of greater scope and I again wasn’t too surprised by what I read there.

It is chapter 11 of this book, Testing Overview (written by Adam Bender), that contains the most interesting content in my opinion and the remainder of this blog post looks in detail at this chapter.

The author says:

since the early 2000s, the software industry’s approach to testing has evolved dramatically to cope with the size and complexity of modern software systems. Central to that evolution has been the practice of developer-driven, automated testing.

I agree that the general industry approach to testing has changed a great deal in the last twenty years. These changes have been driven in part by changes in technology and the ways in which software is delivered to users. They’ve also been driven to some extent by the desire to cut cost and it seems to me that focusing more on automation has been seen (misguidedly) as a way to reduce the overall cost of delivering software solutions. This focus has led to a reduction in the investment in humans to assess what we’re building and I think we all too often experience the results of that reduced level of investment.

Automated testing can prevent bugs from escaping into the wild and affecting your users. The later in the development cycle a bug is caught, the more expensive it is; exponentially so in many cases.

Given the perception of Google as a leader in IT, I was very surprised to see this nonsense about the cost of defects being regurgitated here. This idea is “almost entirely anecdotal” according to Laurent Bossavit in his excellent The Leprechauns of Software Engineering book and he has an entire chapter devoted to this particular mythology. I would imagine that fixing bugs in production for Google is actually inexpensive given the ease with which they can go from code change to delivery into the customer’s hands.

Much ink has been spilled about the subject of testing software, and for good reason: for such an important practice, doing it well still seems to be a mysterious craft to many.

I find the choice of words here particularly interesting, describing testing as “a mysterious craft”. While I think of software testing as a craft, I don’t think it’s mysterious although my experience suggests that it’s very difficult to perform well. I’m not sure whether the wording is a subtle dig at parts of the testing industry in which testing is discussed in terms of it being a craft (e.g. the context-driven testing community) or whether they are genuinely trying to clear up some of the perceived mystery by explaining in some detail how Google approaches testing in this book.

The ability for humans to manually validate every behavior in a system has been unable to keep pace with the explosion of features and platforms in most software. Imagine what it would take to manually test all of the functionality of Google Search, like finding flights, movie times, relevant images, and of course web search results… Even if you can determine how to solve that problem, you then need to multiply that workload by every language, country, and device Google Search must support, and don’t forget to check for things like accessibility and security. Attempting to assess product quality by asking humans to manually interact with every feature just doesn’t scale. When it comes to testing, there is one clear answer: automation
(note: bold emphasis is mine)

We then come to the source of the quote that first piqued my interest. I find it interesting that they seem to be suggesting the need to “test everything” and using that as a justification for saying that using humans to interact with “everything” isn’t scalable. I’d have liked to see some acknowledgement here that the intent is not to attempt to test everything, but rather to make skilled, risk-based judgements about what’s important to test in a particular context for a particular mission (i.e. what are we trying to find out about the system?). The subset of the entire problem space that’s important to us is something we can potentially still ask humans to interact with in valuable ways. The “one clear answer” for testing being “automation” makes little sense to me, given the well-documented shortcomings of automated checks (some of which are acknowledged in this same book) and the different information we should be looking to gather from human interactions with the software compared to that from algorithmic automated checks.

Unlike the QA processes of yore, in which rooms of dedicated software testers pored over new versions of a system, exercising every possible behavior, the engineers who build systems today play an active and integral role in writing and running automated tests for their own code. Even in companies where QA is a prominent organization, developer-written tests are commonplace. At the speed and scale that today’s systems are being developed, the only way to keep up is by sharing the development of tests around the entire engineering staff.
Of course, writing tests is different from writing good tests. It can be quite difficult to train tens of thousands of engineers to write good tests. We will discuss what we have learned about writing good tests in the chapters that follow.

I think it’s great that developers are more involved in testing than they were in the days of yore. Well-written automated checks provide some safety around changing product code and help to prevent a skilled tester from wasting their time on known “broken” builds. But, again, the only discussion that follows in this particular book (as promised in the last sentence above) is about automation and not skilled human testing.

Fast, high-quality releases
With a healthy automated test suite, teams can release new versions of their application with confidence. Many projects at Google release a new version to production every day—even large projects with hundreds of engineers and thousands of code changes submitted every day. This would not be possible without automated testing.

The ability to get code changes to production safely and quickly is appealing and having good automated checks in place can certainly help to increase the safety of doing so. “Confidence” is an interesting choice of word to use around this (and is used frequently in this book), though – the Oxford dictionary definition of “confidence” is “a feeling or belief that one can have faith in or rely on someone or something”, so the “healthy automated test suite” referred to here appears to be one that these engineers feel comfortable to rely on enough to say whether new code should go to production or not.

The other interesting point here is about the need to release new versions so frequently. While it makes sense to have deployment pipelines and systems in place that enable releasing to production to be smooth and uneventful, the desire to push out changes to customers very frequently seems like an end in itself these days. For most testers in most organizations, there is probably no need or desire for such frequent production changes so deciding testing strategy on the perceived need for these frequent changes could lead to goal displacement – and potentially take an important aspect of assessing those changes (viz. human testers) out of the picture altogether.

If test flakiness continues to grows you will experience something much worse than lost productivity: a loss of confidence in the tests. It doesn’t take needing to investigate many flakes before a team loses trust in the test suite, After that happens, engineers will stop reacting to test failures, eliminating any value the test suite provided. Our experience suggests that as you approach 1% flakiness, the tests begin to lose value. At Google, our flaky rate hovers around 0.15%, which implies thousands of flakes every day. We fight hard to keep flakes in check, including actively investing engineering hours to fix them.

It’s good to see this acknowledgement of the issues around automated check stability and the propensity for unstable checks to lead to a collapse in trust in the entire suite. I’m interested to know how they go about categorizing failing checks as “flaky” to be included in their overall 0.15% “flaky rate”, no doubt there’s some additional human effort involved there too.

Just as we encourage tests of smaller size, at Google, we also encourage engineers to write tests of narrower scope. As a very rough guideline, we tend to aim to have a mix of around 80% of our tests being narrow-scoped unit tests that validate the majority of our business logic; 15% medium-scoped integration tests that validate the interactions between two or more components; and 5% end-to-end tests that validate the entire system. Figure 11-3 depicts how we can visualize this as a pyramid.

It was inevitable during coverage of automation that some kind of “test pyramid” would make an appearance! In this case, they use the classic Mike Cohn automated test pyramid but I was shocked to see them labelling the three different layers with percentages based on test case count. By their own reasoning, the tests in the different layers are of different scope (that’s why they’re in different layers, right?!) so counting them against each other really makes no sense at all.

Our recommended mix of tests is determined by our two primary goals: engineering productivity and product confidence. Favoring unit tests gives us high confidence quickly, and early in the development process. Larger tests act as sanity checks as the product develops; they should not be viewed as a primary method for catching bugs.

The concept of “confidence” being afforded by particular kinds of checks arises again and it’s also clear that automated checks are viewed as enablers of productivity.

Trying to answer the question “do we have enough tests?” with a single number ignores a lot of context and is unlikely to be useful. Code coverage can provide some insight into untested code, but it is not a substitute for thinking critically about how well your system is tested.

It’s good to see context being mentioned and also the shortcomings of focusing on coverage numbers alone. What I didn’t really find anywhere in what I read in this book was the critical thinking that would lead to an understanding that humans interacting with what’s been built is also a necessary part of assessing whether we’ve got what we wanted. The closest they get to talking about humans experiencing the software in earnest comes from their thoughts around “exploratory testing”:

Exploratory Testing is a fundamentally creative endeavor in which someone treats the application under test as a puzzle to be broken, maybe by executing an unexpected set of steps or by inserting unexpected data. When conducting an exploratory test, the specific problems to be found are unknown at the start. They are gradually uncovered by probing commonly overlooked code paths or unusual responses from the application. As with the detection of security vulnerabilities, as soon as an exploratory test discovers an issue, an automated test should be added to prevent future regressions.
Using automated testing to cover well-understood behaviors enables the expensive and qualitative efforts of human testers to focus on the parts of your products for which they can provide the most value – and avoid boring them to tears in the process.

This description of what exploratory testing is and what it’s best suited to are completely unfamiliar to me, as a practitioner of exploratory testing for fifteen years or so. I don’t treat the software “as a puzzle to be broken” and I’m not even sure what it would mean to do so. It also doesn’t make sense to me to say “the specific problems to be found are unknown at the start”, surely this applies to any type of testing? If we already know what the problems are, we wouldn’t need to test to discover them. My exploratory testing efforts are not focused on “commonly overlooked code paths” either, in fact I’m rarely interested in the code but rather the behaviour of the software experienced by the end user. Given that “exploratory testing” as an approach has been formally defined for such a long time (and refined over that time), it concerns me to see such a different notion being labelled as “exploratory testing” in this book.

TL;DRs
Automated testing is foundational to enabling software to change.
For tests to scale, they must be automated.
A balanced test suite is necessary for maintaining healthy test coverage.
“If you liked it, you should have put a test on it.”
Changing the testing culture in organizations takes time.

In wrapping up chapter 11 of the book, the focus is again on automated checks with essentially no mention of human testing. The scaling issue is highlighted here also, but thinking solely in terms of scale is missing the point, I think.

The chapters of this book devoted to ‘testing” in some way cover a lot of ground, but the vast majority of that journey is devoted to automated checks of various kinds. Given Google’s reputation and perceived leadership status in IT, I was really surprised to see mention of the “cost of change curve” and the test automation pyramid, but not surprised by the lack of focus on human exploratory testing.

Circling back to that triggering quote I saw in Adam’s blog (“Attempting to assess product quality by asking humans to manually interact with every feature just doesn’t scale”), I didn’t find an explanation of how they do in fact assess product quality – at least in the chapters I read. I was encouraged that they used the term “assess” rather than “measure” when talking about quality (on which James Bach wrote the excellent blog post, Assess Quality, Don’t Measure It), but I only read about their various approaches to using automated checks to build “confidence”, etc. rather than how they actually assess the quality of what they’re building.

I think it’s also important to consider your own context before taking Google’s ideas as a model for your own organization. The vast majority of testers don’t operate in organizations of Google’s scale and so don’t need to copy their solutions to these scaling problems. It seems we’re very fond of taking models, processes, methodologies, etc. from one organization and trying to copy the practices in an entirely different one (the widespread adoption of the so-called “Spotify model” is a perfect example of this problem).

Context is incredibly important and, in this particular case, I’d encourage anyone reading about Google’s approach to testing to be mindful of how different their scale is and not use the argument from the original quote that inspired this post to argue against the need for humans to assess the quality of the software we build.

* It would be remiss of me not to mention a brilliant response to this same quote from Michael Bolton – in the form of his 47-part Twitter thread (yes, 47!).

All testing is exploratory: change my mind

3 Replies

I’ve recently returned to Australia after several weeks in Europe, mainly for pleasure with a small amount of work along the way. Catching up on some of the testing-related chatter on my return, I spotted that Rex Black repeated his “Myths of Exploratory Testing” webinar in September. I respect the fact that he shares his free webinar content every month and, even though I often find myself disagreeing with his opinions, hearing what others think about software testing helps me to both question and cement my own thoughts and refine my arguments about what I believe good testing looks like.

Rex started off with his definition of exploratory testing (ET), viz.

A technique that uses knowledge, experience and skills to test software in a non-linear and investigatory fashion

He claimed that this is a “pretty widely shared definition of ET” but I don’t agree. The ISTQB Glossary uses the following definition:

An approach to testing whereby the testers dynamically design and execute tests based on their knowledge, exploration of the test item and the results of previous tests.

The definition I hear most often is something like the following James Bach/Michael Bolton effort (their edit of Cem Kaner’s suggestion, which they used until 2015):

An approach to software testing that emphasizes the personal freedom and responsibility of each tester to continually optimize the value of his work by treating learning, test design and test execution as mutually supportive activities that run in parallel throughout the project

They have since deprecated the term “exploratory testing” in favour of simply “testing” (from 2015), defining testing as:

Evaluating a product by learning about it through exploration and experimentation, including to some degree: questioning, study, modeling, observation, inference, etc.

Rex went on to say that the test basis and test oracles in ET “are primarily skills, knowledge and experience” and any such testing is referred to as “experience-based testing” (per the ISTQB definition, viz. “Testing based on the tester’s experience, knowledge and intuition.”). Experience-based testing that is investigatory is then deemed to be exploratory. I have several issues with this. There is an implication here that ET involves testing without using a range of oracles that might include specifications, user stories, or other more “formal” sources of what the software is meant to do. Rex reinforces this when he goes on to say that ET is a form of validation and “may tell us little or nothing about conformance to specification because the specification may not even be consulted by the tester”. Also, I can’t imagine any valuable testing that doesn’t rely on the tester’s skills, knowledge and experience so it seems to me that all testing would fall under this “experience-based testing” banner.

The first myth Rex discussed was the “origin myth”, that ET was invented in the 1990s in Silicon Valley or at least that was when a “name got hung on it” (e.g. Cem Kaner). He argued instead that it was invented by whoever wrote the first program, that IBM were doing it in the 1960s, that the independent test teams in Fred Brooks’s 1975 book Mythical Man Month were using ET, and “error guessing” as introduced by Glenford Myers in the classic book Art of Software Testing sounds “a whole lot like a form of ET”. The History of Definitions of ET on James Bach’s blog is a good reference in this regard, in my opinion. While I agree that programmers have been performing some kind of investigatory or unscripted testing in their development and debugging activities as long as programming has been a thing, it’s important that we define our testing activities in a way that makes the way we talk about what we do both accurate and credible. I see the argument for suggesting that error guessing is a form of ET, but it’s just one tactic that might be employed by a tester skilled in the much broader approach that is ET.

The next myth Rex discussed was the “completeness myth”, that “playing around” with the software is sufficient to test it. He mentioned that there is little education around testing in degrees in Software Engineering so people don’t understand what testing can and cannot do, which leads to myths like this. I agree that there is a general lack of understanding in our industry of how important structured ET is as part of a testing strategy, I haven’t personally heard this myth being espoused anywhere recently though.

Next up was the “sufficiency myth”, that some teams bring in a “mighty Jedi warrior of ET & this person has helped [them] to find every bug that can matter”. He mentioned a study from Microsoft where they split their testing groups for the same application, with one using ET (and other reactive strategies) only, while the other used pre-designed tests (including automated tests) only. The sets of bugs found by these two teams was partially but not fully overlapping, hence proving that ET alone is not sufficient. I’m confident that even if the groups had been divided up and did the same kind of testing (be it ET or pre-designed), then the sets of bugs from the two teams would also have been partially but not fully overlapping (there is some evidence to support this, albeit from a one-off small case study, from Aaron Hodder & James Bach in their article Test Cases Are Not Testing)! I’m not sure where this myth comes from, I’ve not heard it from anyone in the testing industry and haven’t seen a testing strategy that relies solely on ET. I do find that using ET as an approach can really help in focusing on finding bugs that matter, though, and that seems like a good thing to me.

Rex continued with the “irrelevance myth”, that we don’t have to worry about ET (or, indeed, any validation testing at all) because of the use of ATDD, BDD, or TDD. He argued that all of these approaches are verification rather than validation, so some validation is still relevant (and necessary). I’ve seen this particular myth and, if anything, it seems to be more prevalent over time especially in the CI/CD/DevOps world where automated checks (of various kinds) are viewed as sufficient gates to production deployment. Again, I see this as a lack of understanding of what value ET can add and that’s on us as a testing community to help people understand that value (and explain where ET fits into these newer, faster deployment approaches).

The final myth that Rex brought up was the “ET is not manageable myth”. In dispelling this myth, he mentioned the Rapid Reporter tool, timeboxed sessions, and scoping using charters (where a “charter is a set of one or more test conditions”). This was all quite reasonable, basically referring to session-based test management (SBTM) without using that term. One of his recommendations seemed odd, though: “record planned session time versus actual [session] time” – sessions are strictly timeboxed in an SBTM situation so planned and actual time are always the same. While this seems to be one of the more difficult aspects of SBTM at least initially for testers in my experience, sticking to the timebox is critical if ET is to be truly manageable.

Moving on from the myths, Rex talked about “reactive strategies” in general, suggesting they were suitable in agile environments but that we also need risk-based strategies and automation in addition to ET. He said that the reliance on skills and experience when using ET (in terms of the test basis and test oracle) mean that heuristics are a good way of triggering test ideas and he made the excellent point that all of our “traditional” test techniques still apply when using ET.

Rex’s conclusion was also sound, “I consider (the best practice of) ET to be essential but not sufficient by itself” and I have no issue with that (well, apart from his use of the term “best practice”) – and again don’t see any credible voices in the testing community arguing otherwise.

The last twenty minutes of the webinar was devoted to Q&A from both the online and live audience (the webinar was delivered in person at the STPCon conference). An interesting question from the live audience was “Has ET finally become embedded in the software testing lifecycle?” Rex responded that the “religious warfare… in the late 2000s/early 2010s has abated, some of the more obstreperous voices of that era have kinda taken their show off the road for various reasons and aren’t off stirring the pot as much”. This was presumably in reference to the somewhat heated debate going on in the context-driven testing community in that timeframe, some of which was unhelpful but much of which helped to shape much clearer thinking around ET, SBTM and CDT in general in my opinion. I wouldn’t describe it as “religious warfare”, though.

Rex also mentioned in response to this question that he actually now sees the opposite problem in the DevOps world, with “people running around saying automate everything” and the belief that automated tests by themselves are sufficient to decide when software is worthy of deployment to production. In another reference to Bolton/Bach, he argued that the “checking” and “testing” distinction was counterproductive in pointing out the fallacy of “automate everything”. I found this a little ironic since Rex constantly seeks to make the distinction between validation and verification, which is very close to the distinction that testing and checking seeks to draw (albeit in much more lay terms as far as I’m concerned). I’ve actually found the “checking” and “testing” terminology extremely helpful in making exactly the point that there is “testing” (as commonly understood by those outside of our profession) that cannot be automated, it’s a great conversation starter in this area for me.

One of Rex’s closing comments was again directed to the “schism” of the past with the CDT community, “I’m relieved that we aren’t still stuck in these incredibly tedious religious wars we had for that ten year period of time”.

There was a lot of good content in Rex’s webinar and nothing too controversial. His way of talking about ET (even the definition he chooses to use) is different to what I’m more familiar with from the CDT community but it’s good to hear him referring to ET as an essential part of a testing strategy. I’ve certainly seen an increased willingness to use ET as the mainstay of so-called “manual” testing efforts and putting structure around it using SBTM adds a lot of credibility. For the most part in my teams across Quest, we now consider test efforts to be considered ET only if they are performed within the framework of SBTM so that we have that accountability and structure in place for the various stakeholders to treat this approach as credible and worthy of their investment.

So, finally getting to the reason for the title of this post, both by Rex’s (I would argue unusual) definition (and even the ISTQB’s definition) or by what I would argue is the more widely accepted definition (Bach/Bolton above), it seems to me that all testing is exploratory. I’m open to your arguments to change my mind!

(For reference, Rex publishes all his webinars on the RBCS website at http://rbcs-us.com/resources/webinars/ The one I refer to in this blog post has not appeared there as yet, but the audio is available via https://rbcs-us.com/resources/podcast/)

How to win the war? Follow the script

1 Reply

In mid-2002, the US armed forces ran one of the largest and most expensive war game experiments in history, known as the “Millennium Challenge 2002”. It was designed to be a test of new technologies to enable network-centric warfare to give better command and control over both current and future weaponry and tactics.

The scenario was that a crazed but cunning (and strongly anti-American) military commander had broken away from his government somewhere in the Persian Gulf. Religious and ethnic loyalty gave him power and strong links to terrorist organizations made him even more dangerous. War was imminent.

The US side, known as the “Blue” team (as they always are in such military exercises apparently), were pitted against the “Red” team – with the rogue commander being played by retired Marine Corps Lieutenant General, Paul Van Riper.

It’s worth a quick note on the character of Van Riper at this point. His forty year military career included Vietnam and reading about him (especially from the words of those he led) it is clear that he was a straight-talking leader who inspired his teams to work for him even in the most dangerous and difficult of circumstances. By the time of this war game, he was retired and in his mid-60s – with no real need to be circumspect.

What actually happened during the running of the war game is described well in [1]:

In the first few days of the exercise, using surprise and unorthodox tactics, the wily 64-year-old Vietnam veteran sank most of the US expeditionary fleet in the Persian Gulf, bringing the US assault to a halt.

What happened next will be familiar to anyone who ever played soldiers in the playground. Faced with an abrupt and embarrassing end to the most expensive and sophisticated military exercise in US history, the Pentagon top brass simply pretended the whole thing had not happened. They ordered their dead troops back to life and “refloated” the sunken fleet. Then they instructed the enemy forces to look the other way as their marines performed amphibious landings. Eventually, Van Riper got so fed up with all this cheating that he refused to play any more. Instead, he sat on the sidelines making abrasive remarks until the three-week war game – grandiosely entitled Millennium Challenge – staggered to a star-spangled conclusion on August 15, with a US “victory”.

Van Riper very publicly aired his opinions on how ridiculously the game had been played and strongly criticized the idea that the ultimate “Blue” victory validated anything about the technology and approach the game was designed to test. In [2], he says:

There were accusations that Millennium Challenge was rigged. I can tell you it was not. It started out as a free-play exercise, in which both Red and Blue had the opportunity to win the game. However, about the third or fourth day, when the concepts that the command was testing failed to live up to their expectations, the command then began to script the exercise in order to prove these concepts.

This was my critical complaint. You might say, “Well, why didn’t these concepts live up to the expectations?” I think they were fundamentally flawed in that they leaned heavily on technology. They leaned heavily on systems analysis of decision-making.

It would seem that the skills and experience of Paul Van Riper and his ability to react quickly to what he observed gave him a significant advantage over the scripted, process-driven approach of his enemy. Yet, rather than making any effort to incorporate his alternative strategies, it was deemed better to constrain his actions to allow the script to play out the way it was “meant to”.

The analogy with scripted vs exploratory tests is very strong I think, so perhaps next time you’re locked in battle with a factory schooled commander of scripted testing, take up the battle and demonstrate your superior powers of testing. Even if your testing war game ends up the same way as the Millenium Challenge, at least you might have won the battle – and won some supporters for your exploratory testing cause along the way.

For reference…

[1] “Wake-up Call” (The Guardian, UK): http://www.theguardian.com/world/2002/sep/06/usa.iraq

[2] “The Immutable Danger of War” (Scott Willis interview with Van Riper) http://www.pbs.org/wgbh/nova/military/immutable-nature-war.html

(You can read more about the Millenium Challenge (2002) on wikipedia.)

Inspiration for this post came from reading about this war game in the fascinating book Blink by Malcolm Gladwell (and the same book provided inspiration for my previous post, Maier’s Two Cord Puzzle and Testing Heuristics).

Getting our message across about what “testing” really is

3 Replies

A recent Tweet about the BugBuster product again made me realise what a long journey we have as a community to educate the wider populous about what “testing” actually is (and is not).

The BugBuster website, for example, says this on its “Features” page:

Who said testing meant writing and endlessly maintaining test cases? BugBuster runs smart software agents that explore and test your website automatically. That’s right, no need to write test cases! The agents … test the various elements of the web app as if it was done by a human being.

The emphasis on the tool doing the same thing as humans is such a common perception of what testing can be reduced to, the “checking”* mentality is everywhere. I have no issues with using tools to help with testing, with automation to perform mundane checking, to help speed up development (not testing). But I do take issue with the idea that testing is dehumanizable.

They raise a good question here: “who said testing meant writing and endlessly maintaining test cases?” I spent too long thinking this was my job too and it’s almost unbelievable to look back at that time and think that I was adding any value to anything. The realization that testing really isn’t this but is in fact intellectually challenging and can add incredible value to the process of delivering great software for our users took me too long to reach, but at least I got there (thanks to Michael Bolton and the life changing experience that was his Rapid Software Testing course back in 2007).

How do we help others in this industry come to the same realization when they are bombarded with messages that dehumanize what “testing” really is? The context-driven testing community is full of great thinkers and their ideas about how to do great testing, but how do we in that community get our message across to the masses? While we do already have organizations like AST and ISST flying the CDT flag, what else can we do to broaden the wider community’s knowledge of what “testing” really is?

* Want to know more about the “Testing vs. checking” distinction? Start here with this Michael Bolton blog post.

Rockin' and Testing All Over The World

The place where Lee Hawkins shares his thoughts on software testing

Category Archives: Testing and checking

ER: “Ask Me Anything” session on Exploratory Testing

The power of the pause

Is talking about “scaling” human testing missing the point?

All testing is exploratory: change my mind

How to win the war? Follow the script

Getting our message across about what “testing” really is