Category Archives: Books

4 testing takeaways from “Meltdown” (Chris Clearfield & András Tilcsik)

I recently read “Meltdown” by Chris Clearfield & András Tilcsik. It was an engaging and enjoyable read, illustrated by many excellent real-world examples of failure. As is often the case, I found that much of the book’s content resonated closely with testing and I’ll share four of the more obvious cases in this blog post, viz:

  1. The dangers of alarm storms
  2. Systems and complexity
  3. Safety systems become a cause of failure
  4. The value of pre-mortems

1. The dangers of alarm storms

Discussing the failures around the infamous accident at the Three Mile Island nuclear facility in 1979, the book looks at the situation faced by operators in the control room:

An indicator light in the control room led operators to believe that the valve was closed. But in reality, the light showed only that the valve had been told to close, not that it had closed. And there were no instruments directly showing the water level in the core so operators relied on a different measurement: the water level in a part of the system called the pressurizer. But as water escaped through the stuck-open valve, water in the pressurizer appeared to be rising even as it was falling in the core. So the operators assumed that there was too much water, when in fact they had the opposite problem. When an emergency cooling system turned on automatically and forced water into the core, they all but shut it off. The core began to melt.

The operators knew something was wrong, but they didn’t know what, and it took them hours to figure out that water was being lost. The avalanche of alarms was unnerving. With all the sirens, klaxon horns, and flashing lights, it was hard to tell trivial warnings from vital alarms.

Meltdown, p18 (emphasis is mine)

I often see a similar problem with the results reported from large so-called “automated test suites”. As such suites get more and more tests added to them over time (it’s rare for me to see folks removing tests, it’s seen as heresy to do so even if those tests may well be redundant), the number of failing tests tends to increase and normalization of test failure sets in. Amongst the many failures, there could be important problems but the emergent noise makes it increasingly hard to pick those out.

I often question the value of such suites (i.e. those that have multiple failed tests on every run) but there still seems to be a preference for “coverage” (meaning “more tests”, not actually more coverage) over stability. Suites of tests that tell you nothing different whether they all pass or some fail are to me pointless and pure waste.

So, are you in control of your automated test suites and what are they really telling you? Are they in fact misleading you about the state of your product?

2. Systems and complexity

The book focuses on complex systems and how they are different when it comes to diagnosing problems and predicting failures. On this:

Here was one of the worst nuclear accidents in history, but it couldn’t be blamed on obvious human errors or a big external shock. It somehow just emerged from small mishaps that came together in a weird way.

In Perrow’s view, the accident was not a freak occurrence, but a fundamental feature of the nuclear power plant as a system. The failure was driven by the connections between different parts, rather than the parts themselves. The moisture that got into the air system wouldn’t have been a problem on its own. But through its connection to pumps and the steam generator, a host of valves, and the reactor, it had a big impact.

For years, Perrow and his team of students trudged through the details of hundreds of accidents, from airplane crashes to chemical plant explosions. And the same pattern showed up over and over again. Different parts of a system unexpectedly interacted with one another, small failures combined in unanticipated ways, and people didn’t understand what was happening.

Perrow’s theory was that two factors make systems susceptible to these kinds of failures. If we understand those factors, we can figure out which systems are most vulnerable.

The first factor has to do with how the different parts of the system interact with one another. Some systems are linear: they are like an assembly line in a car factory where things proceed through an easily predictable sequence. Each car goes from the first station to the second to the third and so on, with different parts installed at each step. And if a station breaks down, it be immediately obvious which one failed. It’s also clear What the consequences will be: cars won’t reach the next station and might pile up at the previous one. In systems like these, the different parts interact in mostly visible and predictable ways.

Other systems, like nuclear power plants, are more complex: their parts are more likely to interact in hidden and unexpected ways. Complex systems are more like an elaborate web than an assembly line. Many of their parts are intricately linked and can easily affect one another. Even seemingly unrelated parts might be connected indirectly, and some subsystems are linked to many parts of the system. So when something goes wrong, problems pop up everywhere, and it’s hard figure out what’s going on.

In a complex system, we can’t go in to take a look at what’s happening in the belly of the beast. We need to rely on indirect indicators to assess most situations. In a nuclear power plant, for example, we can’t just send someone to see what’s happening in the core. We need to piece together a full picture from small slivers – pressure indications, water flow measurements, and the like. We see some things but not everything. So our diagnoses can easily turn out to be wrong.

Perrow argued something similar: we simply can’t understand enough about complex systems to predict all the possible consequences of even a small failure.

Meltdown, p22-24 (emphasis is mine)

I think this discussion of the reality of failure in complex systems makes it clear that trying to rigidly script out tests to be performed against such systems is unlikely to help us reveal these potential failures. Some of these problems are emergent from the “elaborate web” and so our approach to testing these systems needs to be flexible and experimental enough to navigate this web with some degree of effectiveness.

It also makes clear that skills in risk analysis are very important in testing complex systems (see also point 4 in this blog post) and that critical thinking is essential.

3. Safety systems become a cause of failure

On safety systems:

Charles Perrow once wrote that “safety systems are the biggest single source of catastrophic failure in complex, tightly coupled systems.” He was referring to nuclear power plants, chemical refineries, and airplanes. But he could have been analyzing the Oscars. Without the extra envelopes, the Oscars fiasco would have never happened.

DESPITE PERROW’S WARNING, safety features have an obvious allure. They prevent some foreseeable errors, so it’s tempting to use as many of them as possible. But safety features themselves become part of the system – and that adds complexity. As complexity grows, we’re more likely to encounter failure from unexpected sources.

Meltdown, p85 (Oscars fiasco link added, emphasis is mine)

Some years ago, I owned a BMW and, it turns out, it was packed full of sensors designed to detect all manner of problems. I only found about some of them when they started to go wrong – and doing so much more frequently than the underlying problems they were meant to detect. Sensor failure was becoming an everyday event, while the car generally ran fine. I solved the problem by selling the car.

I’ve often pitched good automation as a way to help development (not testing) move faster with more safety. Putting in place solid automated checks at various different levels can provide excellent change detection, allowing mis-steps during development to be caught soon after they are introduced. But the author’s point is well made – we run the risk of adding so many automated checks (“safety features”) that they themselves become the more likely source of failure – and then we’re back to point 1 of this post!

I’ve also seen similar issues with adding excessive amounts of monitoring and logging, especially in cloud-based systems, “just because we can”. Not only can these give rise to bill shock, but they also become potential sources of failure in themselves and thereby start to erode the benefits they were designed to bring in diagnosing failures with the system itself.

4. The value of pre-mortems

The “premortem” comes up in this book and I welcomed the handy reminder of the concept. The idea is simple and feels like it would work well from a testing perspective:

Of course, it’s easy to be smart in hindsight. The rearview mirror, as Warren Buffett once supposedly said, is always clearer than the windshield. And hindsight always comes too late – or so it seems. But what if there was a way to harness the power of hindsight before a meltdown happened? What if we could benefit from hindsight in advance?

This question was based on a clever method called the premortem. Here’s Gary Klein, the researcher who invented it:

If a project goes poorly, there will be a lessons-learned session that looks at what went wrong and why the project failed – like a medical postmortem. Why don’t we do that up front? Before a project starts, we should say, “We’re looking in a crystal ball, and this project has failed; it’s a fiasco. Now, everybody, take minutes and write down all the reasons why you think the project failed.”

Then everyone announces what they came up with – and they suggest solutions to the risks on the group’s collective list.

The premortem method is based on something psychologists call prospective hindsight – hindsight that comes from imagining that an event has already occurred. A landmark 1989 study showed that prospective hindsight boosts our ability to identify reasons why an outcome might occur. When research subjects used prospective hindsight, they came up with many more reasons – and those reasons tended to be more concrete and precise – than when they didn’t imagine the outcome. It’s a trick that makes hindsight work for us, not against us.

If an outcome is certain, we come up with more concrete explanations for it – and that’s the tendency the premortem exploits. It reframes how we think about causes, even if we just imagine the outcome. And the premortem also affects our motivation. “The logic is that instead of showing people that you are smart because you can come up with a good plan, you show you’re smart by thinking of insightful reasons this project might go south,” says Gary Klein. “The whole dynamic changes from trying to avoid anything that might disrupt harmony to trying to surface potential problems.”

Meltdown, p114-118

I’ve facilitated risk analysis workshops and found them to be useful in generating a bunch of diverse ideas about what might go wrong (whether that be for an individual story, a feature or even a whole release). The premortem idea could be used to drive these workshops slightly differently, by asking the participants to imagine that a bad outcome has already occurred and then coming up with ways that could have happened. This might result in the benefit of prospective hindsight as mentioned above. I think this is worth a try and will look for an opportunity to give it a go.

In conclusion

I really enjoyed reading “Meltdown” and it gave me plenty of food for thought from a testing perspective. I hope the few examples I’ve written about in this post are of interest to my testing audience!

Lessons for testing changemakers in “This Is Marketing” (Seth Godin)

I recently read This Is Marketing by Seth Godin and found it interesting and well-written, as I’d expected. But I didn’t expect this book to have some worthwhile lessons for testing folks who might be trying to change the way testing is thought about and performed within their teams and organizations.

I don’t think I’d previously considered marketing in these terms, but Seth says “If you want to spread your ideas, make an impact, or improve something, you are marketing”. If we’re trying to influence changes in testing, then one of our key skills is marketing the changes we want to make. The following quote from the book (and, no, I didn’t choose this quote simply because it mentions “status quo”!) is revealing:

How the status quo got that way

The dominant narrative, the market share leader, the policies and procedures that rule the day – they all exist for a reason.

They’re good at resisting efforts by insurgents like you.

If all it took to upend the status quo was the truth, we would have changed a long time ago.

If all we were waiting for was a better idea, a simpler solution, or a more efficient procedure, we would have shifted away from the status quo a year or a decade or a century ago.

The status quo doesn’t shift because you’re right. It shifts because the culture changes.

And the engine of culture is status.

I certainly recognise this in some of my advocacy efforts over the years when I was focused on repeating my “truth” about the way things should be from a testing perspective, but less tuned in to the fact that the status quo wasn’t going to shift simply by bombarding people with facts or evidence.

Seth also talks about “The myth of rational choice”:

Microeconomics is based on a demonstrably false assertion. “The rational agent is assumed to take account of available information, probabilities of events, and potential costs and benefits in determining preferences, and to act consistently in choosing the self-determined best choice of action,” says Wikipedia.

Of course not.

Perhaps if we average up a large enough group of people, it is possible that, in some ways, on average, we might see glimmers of this behavior. But it’s not something I’d want you to bet on.

In fact, the bet you’re better off making is: “When in doubt, assume that people will act according to their current irrational urges, ignoring information that runs counter to their beliefs, trading long-term for short-term benefits and most of all, being influenced by the culture they identify with.”

You can make two mistakes here:

1. Assume that the people you’re seeking to serve are well-informed, rational, independent, long-term choice makers.

2. Assume that everyone is like you, knows what you know, wants what you want.

I’m not rational and neither are you.

(Emphasis is mine)

I’m sure that any of us who’ve tried to instigate changes to way testing gets done in an organization can relate to this! People will often ignore information that doesn’t support their existing beliefs (confirmation bias) and team/organizational culture is hugely influential. It’s almost as though the context in which we attempt to move the needle on testing is important.

I think there are good lessons for testing changemakers in these couple of short passages from Seth’s book, but I would recommend reading the book in its entirety even if you don’t think marketing is your thing – you might just get some unexpected insights like I did.

Testing & “Exponential Organizations” (Salim Ismail, Michael S. Malone & Yuri Van Geest)

I’m not sure how I came across the book Exponential Organizations (by Salim Ismail, Michael S. Malone & Yuri Van Geest) but it ended up on my library reservation list and was a fairly quick read. The book’s central theme is that new technologies allow for a new type of organization – the “ExO” (Exponential Organization) – that can out-achieve more traditional styles of company. The authors claim that:

An ExO can eliminate the incremental, linear way traditional companies get bigger, leveraging assets like community, big data, algorithms, and new technology into achieving performance benchmarks ten times better than its peers.

This blog post isn’t intended to be an in-depth review of the book which, although I found interesting, was far too laden with buzzwords to make it an enjoyable (or even credible) read at times. The content hasn’t aged well, as you might expect when it contains case studies of these hyper-growth companies – many of which went on to implode. A new study of ExO’s from 2021 will form the second edition of the book coming later in 2022, though.

The motivation for this blog post arose from the following quote (which appears on page 140 of the 2014 paperback edition):

One of the reasons Facebook has been so successful is the inherent trust that the company has placed in its people. At most software companies (and certainly the larger ones), a new software release goes through layers upon layers of unit testing, system testing and integration testing, usually administered by separate quality assurance departments. At Facebook, however, development teams enjoy the full trust of management. Any team can release new code onto the live site without oversight. As a management style, it seems counterintuitive, but with individual reputations at stake – and no-one else to catch shoddy coding – Facebook teams end up working that much harder to ensure there are no errors. The result is that Facebook has been able to release code of unimaginable complexity faster than any other company in Silicon Valley history. In the process, it has seriously raised the bar.

I acknowledge that the authors of this book are not well versed in software testing and the focus of their book is not software development. Having said that, writing about testing as they’ve done here is potentially damaging in the broader context of those tangential to software development who might be misled by such claims about testing. Let’s unpack this a little more.

The idea that “separate quality assurance departments” were still the norm when this book was written (2014) doesn’t feel quite right to me. The agile train was already rolling along by then and the move to having testers embedded within development teams was well underway. What they’re describing at Facebook sounds more in line with what Microsoft, as an example, were doing around this time with their move to SDETs (Software Development Engineers in Test) as a different model to having embedded testers focused on the more human aspects of testing.

The idea that “development teams enjoy the full trust of management” and “Any team can release new code onto the live site without oversight” is interesting with the benefit of hindsight, given the many public issues Facebook has had around some of the features and capabilities included within its platform. There have been many questions raised around the ethics of Facebook’s algorithms and data management (e.g. the Cambridge Analytica scandal), perhaps unintended consequences of the free rein that has resulted from this level of trust in the developers.

It’s a surprisingly common claim that developers will do a better job of their own testing when there is no obvious safety net being provided by dedicated testers. I’ve not seen evidence to support this but acknowledge that there might be some truth to the claim for some developers. As a general argument, though, it doesn’t feel as strong to me as arguing that people specializing in testing can both help developers to improve their own testing game while also adding their expertise in human testing at a higher level. And, of course, it’s nonsense to suggest that any amount of hard work – by developers, testers or anybody else – can “ensure there are no errors”.

While I can’t comment on the validity of the claim that Facebook has released complex software “faster than any other company in Silicon Valley history”, it doesn’t seem to me like a claim that has much relevance even if it’s true. The claim of “unimaginable complexity”, though, is much more believable; given the benefit of hindsight and the evidence that suggests they probably don’t fully understand what they’ve built either (and we know that there are emergent behaviours inherent in complex software products, as covered expertly by James Christie in his many blog posts on this topic).

The closing sentence claiming that Facebook has “seriously raised the bar” doesn’t provide any context, so what might the authors be referring to here? Has Facebook raised the bar in testing practice? Or in frequently releasing well-considered, ethically-responsible features to its users? Or in some other way? I don’t consider Facebook to be high quality software or a case study of what great testing looks like, but maybe the authors had a different bar in mind that has been raised by Facebook in the area of software development/delivery/testing.

In wrapping up this short post, it was timely that Michael Bolton posted on LinkedIn about the subject matter that is so often lacking in any discussion or writing around software testing today – and his observations cover this paragraph on testing at Facebook perfectly. I encourage you to read his LinkedIn post.

Deep testing and “Deep Work” (Cal Newport)

I’ve just finished reading Deep Work by Cal Newport and I found it engaging, interesting and applicable. While reading it, there were many reminders for me of the work of Michael Bolton and James Bach around “deep testing”.

Cal defines “Deep Work” as:

Professional activities performed in a state of distraction-free concentration that push your cognitive capabilities to their limit. These efforts create new value, improve your skill, and are hard to replicate.

while “Shallow Work” is:

Non-cognitively demanding, logistical-style tasks, often performed while distracted. These efforts tend to not create much new value in the world and are easy to replicate.

He argues that:

In an age of network tools… knowledge workers increasingly replace deep work with the shallow alternative – constantly sending and receiving email messages like human network routers, with frequent breaks for quick hits of distraction. Larger efforts that would be well served by deep thinking…get fragmented into distracted dashes that produce muted quality.

I’m sure that anyone who has worked in an office environment in the IT industry over the last decade will agree that their time has been impacted by distractions and a larger proportion of the working day has become occupied by shallow work. As if open plan offices weren’t bad enough on their own, the constant stream of pulls on your attention from email, Slack and social media notifications has resulted in a very distracted state becoming the norm.

One of the key messages Cal delivers in the book is that deep work is rare, valuable and meaningful:

The Deep Work Hypothesis: The ability to perform deep work is becoming increasingly rare at exactly the same time it is becoming increasingly valuable in our economy. As a consequence, the few who cultivate this skill, and then make it the core of their working life, will thrive.

He makes the observation that, even in knowledge work, there is still a tendency to focus on “busyness”:

Busyness as a Proxy for Productivity: In the absence of clear indicators of what it means to be productive and valuable in their jobs, many knowledge workers turn back toward an industrial indicator of productivity; doing lots of stuff in a visible manner.

I’ve seen this as a real problem for testers in many organizations. When there is poor understanding of what good testing looks like (the norm, unfortunately), it’s all too common for testers to be tracked and measured by test case counts, bug counts, etc. These proxies for productivity really are measures of busyness and not reflections of true value being added by the tester. There seems to be a new trend forming around “deployments to production” as being a useful measure of productivity, when really it’s more an indicator of busyness and often comes as a result of a lack of appetite for any type of pause along the pipeline for humans to meaningfully (and deeply!) interact with the software before it’s deployed. (I may blog separately on the “power of the pause” soon.)

On the subject of how much more meaningful deep work is, Cal refers to Dreyfus & Kelly’s All Things Shining book and its focus on craftsmanship:

A … potential for craftsmanship can be found in most skilled jobs in the information economy. Whether you’re a writer, marketer, consultant, or lawyer: Your work is craft, and if you hone your ability and apply it with respect and care, then like the skilled wheelwright [as example from the Dreyfus & Kelly book] you can generate meaning in the daily efforts of your professional life.

Cultivating craftsmanship is necessarily a deep task and therefore requires a commitment to deep work.

I have referred to software testing as a craft since I first heard it described as such by Michael Bolton during the RST course I attended back in 2007. Talking about testing in this way is important to me and, as Cal mentions, treating it as a craft that you can become skilled in and take pride in all helps to make life as a tester much more meaningful.

The second part of Cal’s book focuses on four rules to help achieve deep work in practice, viz. work deeply, embrace boredom, quit social media, and drain the shallows. I won’t go into detail on the rules here (in the interests of brevity and to encourage you to read the book for yourself to learn these practical tips), but this quote from the “drain the shallows” rule resonated strongly and feels like something we should all be trying to bring to the attention of the organizations we work with:

The shallow work that increasingly dominates the time and attention of knowledge workers is less vital than it often seems in the moment. For most businesses, if you eliminated significant amounts of this shallowness, their bottom line would likely remain unaffected. As as Jason Fried [co-founder of software company 37signals] discovered, if you not only eliminate shallow work, but also replace this recovered time with more of the deep alternative, not only will the business continue to function; it can become more successful.

Coming back to Cal’s definition of “deep work”:

Professional activities performed in a state of distraction-free concentration that push your cognitive capabilities to their limit. These efforts create new value, improve your skill, and are hard to replicate.

When I read this definition, it immediately brought to mind session-based test management (SBTM) in which timeboxed periods of uninterrupted testing are the unit of work. I’ve seen the huge difference that adoption of SBTM can make in terms of encouraging deeper testing and improving testing skills. Thinking about “deep testing”, Michael Bolton and James Bach have described it as follows:

Testing is deep to the degree that it has a probability of finding rare, subtle, or hidden problems that matter.

Deep testing requires substantial skill, effort, preparation, time, or tooling, and reliably and comprehensively fulfills its mission.

By contrast, shallow testing does not require much skill, effort, preparation, time, or tooling, and cannot reliably and comprehensively fulfill its mission.

Blog post https://www.developsense.com/blog/2017/03/deeper-testing-1-verify-and-challenge/ (Michael Bolton)

The parallels between Cal’s idea of “deep work” and Michael & James’s “deep testing” are clear. Being mindful of the difference between such deep testing and the more common shallow testing I see in many teams is important, as is the ability to clearly communicate this difference to stakeholders (especially when testing is squeezed under time pressures or being seen as optional in the frantic pace of continuous delivery environments).

I think “Deep Work” is a book worth reading for testers, not just for the parallels with deep testing I’ve tried to outline above but also for the useful tips around reducing distractions and freeing up your capacity for deeper work.

“Calling Bullsh*t” (Carl T. Bergstrom and Jevin D. West)

It was thanks to a recommendation from Michael Bolton that I came across the book Calling Bullsh*t by Carl T. Bergstrom and Jevin D. West. While it’s not a book specifically about software testing, there are some excellent takeaways for testers as I’ll point to in the following review of the book. This book is a must read for software testers in my opinion.

The authors’ definition of bullshit (BS) is important to note before digging into the content (appearing on page 40):

Bullshit involves language, statistical figures, data graphics, and other forms of presentation intended to persuade or impress an audience by distracting, overwhelming, or intimidating them with a blatant disregard for truth, logical coherence, or what information is actually being conveyed.

I was amazed to read that the authors already run a course at a US university on the same topic as this book:

We have devoted our careers to teaching students how to think logically and quantitatively about data. This book emerged from a course we teach at the University of Washington, also titled “Calling Bullshit”. We hope it will show you that you do not need to be a professional statistician or econometrician or data scientist to think critically about quantitative arguments, nor do you need extensive data sets and weeks of effort to see through bullshit. It is often sufficient to apply basic logical reasoning to a problem and, where needed, augment that with information readily discovered via search engine.

The rise of the internet and particularly social media are noted as ways that BS has proliferated in more recent times, spreading both misinformation (claims that are false but not deliberately designed to deceive) and disinformation (deliberate falsehoods).

…the algorithms driving social media content are bullshitters. They don’t care about the messages they carry. They just want our attention and will tell us whatever works to capture it.

Bullshit spreads more easily in a massively networked, click-driven social media world than in any previous social environment. We have to be alert for bullshit in everything we read.

As testers, we tend to have a critical thinking mindset and are hopefully alert to stuff that just doesn’t seem right, whether that’s the way a feature works in a product or a claim made about some software. It seems to me that testers should naturally be good spotters of BS more generally and this book provides a lot of great tips both for spotting BS and learning how to credibly refute it.

Looking at black boxes (e.g. statistical procedures or data science algorithms), the authors make the crucial point that understanding the inner workings of the black box is not required in order to spot problems:

The central theme of this book is that you usually don’t have to open the analytic black box in order to call bullshit on the claims that come out of it. Any black box used to generate bullshit has to take in data and spit results out.

Most often, bullshit arises either because there are biases in the data that get fed into the black box, or because there are obvious problems with the results that come out. Occasionally the technical details of the black box matter, but in our experience such cases are uncommon. This is fortunate, because you don’t need a lot of technical expertise to spot problems with the data or results. You just need to think clearly and practice spotting the sort of thing that can go wrong.

The first big topic of consideration looks at associations, correlations and causes and spotting claims that confuse one for the other. The authors provide excellent examples in this chapter of the book and a common instance of this confusion in the testing arena is covered by Theresa Neate‘s blog post, Testing and Quality: Correlation does not equal Causation. (I’ve also noted the confusion between correlation and causality very frequently when looking at big ag-funded “studies” used as ammunition against veganism.)

The chapter titled “Numbers and Nonsense” covers the various ways in which numbers are used in misleading and confusing ways. The authors make the valid point that:

…although numbers may seem to be pure facts that exist independently from any human judgment, they are heavily laden with context and shaped by decisions – from how they are calculated to the units in which they are expressed.

It is all too common in the testing industry for people to hang numbers on things that make little or no sense to look at quantitatively, counting “test cases” comes to mind. The book covers various ways in which numbers turn into nonsense, including summary statistics, percentages and percentage points. Goodhart’s Law is mentioned (in its rephrased form by Marilyn Strathern):

When a measure becomes a target, it ceases to be a good measure

I’m sure many of us are familiar with this law in action when we’re forced into “metrics programmes” around testing for which gaming becomes the focus rather than the improvement our organizations were looking for. The authors introduce the idea of mathiness here: “mathiness refers to formulas and expressions that may look and feel like math – even as they disregard the logical coherence and formal rigour of actual mathematics” and testing is not immune from mathiness either, e.g. “Tested = Checked + Explored” is commonly quoted from Elisabeth Hendrickson‘s (excellent) Explore It! book. Another concept that will be very familiar to testers (and others in the IT industry) is zombie statistics, viz.

…numbers that are cited badly out of context, are sorely outdated, or were entirely made up in the first place – but they are quoted so often that they simply won’t die.

There are many examples of such zombie statistics in our industry, Boehm’s so-called cost of change curve being a prime example (claiming that the cost of changes later in the development cycle is orders of magnitude higher than earlier in the cycle) and is of one the examples covered beautifully in Laurent Bossavit’s excellent book, The Leprechauns of Software Engineering.

The next statistical concept introduced in the book is selection bias and I was less familiar with this concept (at least under this name):

Selection bias arises when the individuals that you sample for your study differ systematically from the population of individuals eligible for your study.

This sort of non-random sampling leads to statistical analyses failing or becoming misleading and there are again some well-considered examples to explain and illustrate this bias. Reading this chapter brought to mind my recent critique of the Capgemini World Quality Report in which I noted that both the size of organizations and roles of participants in the survey was problematic. (I again note that from my vegan research that many big ag-funded studies suffer from this bias too.)

A hefty chapter is devoted to data visualization, with the authors noting the relatively recent proliferation of charts and data graphics in the media due to the technology becoming available to more easily produce them. The treatment of the various ways that charts can be misleading is again excellent with sound examples (including axis scaling, axis starting values, and the “binning” of axis values). I loved the idea of glass slippers here, viz.

Glass slippers take one type of data and shoehorn it into a visual form designed to display another. In doing so, they trade on the authority of good visualizations to appear authoritative themselves. They are to data visualizations what mathiness is to mathematical equations.

The misuse of the periodic table visualization is cited as an example and, of course, the testing industry has its own glass slippers in this area, for example Santhosh Tuppad’s Heuristic Table of Testing! This chapter also discusses visualizations that look like Venn diagrams but aren’t, and highlights the dangers of 3-D bar graphs, line graphs and pie charts. A new concept for me in this chapter was the principle of proportional ink:

Edward Tufte…in his classic book The Visual Display of Quantitative Information…states that “the representation of numbers, as physically measured on the surface of the graphic itself, should be directly proportional to the numerical quantities represented.” The principle of proportional ink applies this rule to how shading is used on graphs.

The illustration of this principle by well-chosen examples is again very effective here.

It’s great to see some sensible commentary on the subject of big data in the next chapter. The authors say “We want to provide an antidote to [the] hype” and they certainly achieve this aim. They discuss AI & ML and the critical topic of how training data influences outcomes. They also note how machine learning algorithms perpetuate human biases.

The problem is the hype, the notion that something magical will emerge if only we can accumulate data on a large enough scale. We just need to be reminded: Big data is not better; it’s just bigger. And it certainly doesn’t speak for itself.

The topics of Big Data, AI and ML are certainly hot in the testing industry at the moment, with tool vendors and big consultancies all extoling the virtues of these technologies to change the world of testing. These claims have been made for quite some time now and, as I noted in my critique of the Capgemini World Quality Report recently, the reality has yet to catch up with the hype. I commend the authors here for their reality check in this over-hyped area.

In the chapter titled “The Susceptibility of Science”, the authors discuss the scientific method and how statistical significance (p-values) is often manipulated to aid with getting research papers published in journals. Their explanation of the base rate fallacy is excellent and a worthy inclusion, as it is such a common mistake. While the publication of dodgy papers and misleading statistics are acknowledged, the authors’ belief is that “science just plain works” – and I agree with them. (From my experience in vegan research, I’ve read so many dubious studies funded by big ag but these don’t undermine my faith in science, rather my faith in human nature sometimes!) In closing:

Empirically, science is successful. Individual papers may be wrong and individual studies misreported in the popular press, but the institution as a whole is strong. We should keep this in perspective when we compare science to much of the other human knowledge – and human bullshit – that is out there.

In the penultimate chapter, “Spotting Bullshit”, the discussion of the various means by which BS arises (covered throughout the book) is split out into six ways of spotting it, viz.

  • Question the source of information
  • Beware of unfair comparisons
  • If it seems too good or bad to be true…
  • Think in orders of magnitude
  • Avoid confirmation bias
  • Consider multiple hypotheses

These ways of spotting BS act as a handy checklist I think and will certainly be helpful to me in refining my skills in this area. While I was still reading this book, I listened to a testing panel session online and one of the panelists was from testing tool vendor, Applitools. He briefly mentioned some claims about their visual AI-powered test automation tool. These claims piqued my interest and I managed to find the same statistics on their website:

Applitools claims about their visual AI-powered test automation tool

I’ll leave it as an exercise for the reader to decide if any of the above falls under the various ways BS manifests itself according to this book!

The final chapter, “Refuting Bullshit”, is really a call to action:

…a solution to the ongoing bullshit epidemic is going to require more than just an ability to see it for what it is. We need to shine a light on bullshit where it occurs, and demand better from those who promulgate it.

The authors provide some methods to refute BS, as they themselves use throughout the book in the many well-chosen examples used to illustrate their points:

  • Use reductio ad absurdum
  • Be memorable
  • Find counterexamples
  • Provide analogies
  • Redraw figures
  • Deploy a null model

They also “conclude with a few thoughts about how to [call BS] in an ethical and constructive manner”, viz.

  • Be correct
  • Be charitable
  • Admit fault
  • Be clear
  • Be pertinent

In summary, this book is highly recommended reading for all testers to help them become more skilled spotters of BS; be that from vendors, testing consultants or others presenting information about testing. This skill will also come in very handy in spotting BS in claims made about the products you work on in your own organization!

The amount of energy needed to refute bullshit is an order of magnitude bigger than [that needed] to produce it.

Alberto Brandolini (Italian software engineer, 2014)

After reading this book, you should have the skills to spot BS and I actively encourage you to then find inventive ways to refute it publicly so that others might not get fooled by the same BS.

Our industry needs those of us who genuinely care about testing to call out BS when we see it, I’m hoping to see more of this in our community! (My critique of the Capgemini World Quality Report and review of a blog post by Cigniti are examples of my own work in this area as I learn and refine these skills.)

Publishing my first testing book, “An Exploration of Testers”

As I mentioned in my last blog post, I’ve been working on a testing book for the last year-or-so. With more free time since leaving full-time employment back in August, I’m delighted to have now published my first e-book on testing, called An Exploration of Testers.

The book is formed of contributions from various testers around the world, with seventeen contributions in the first edition. Each tester answered the same set of eleven questions designed to tease out testing, career and life lessons. I was humbled by how much time and effort went into the contributions and also by how willing the community was to engage with the project, with almost every tester I invited to contribute then committing to doing so. A number of contributions will be added in the coming months (and additional versions of the book are free after your initial purchase, so don’t be afraid to buy now!).

My experience of using LeanPub as the publishing platform has been generally very good. When I was researching ways to self-publish, LeanPub seemed to get good reviews and it was free to try so I gave it a go, then ended up sticking with it. I’m still on the free plan and it suffices for now for this project. The platform makes most aspects of creating, publishing and selling a book really straightforward and the markdown language used for writing the manuscript is easy to learn (though sometimes comes with frustrating limitations on the control of layout). I would recommend LeanPub to others looking to write their first book.

At the very start of the project, I decided that any proceeds from sales of the book would be ploughed back into the testing community and this fact seemed to encourage participation in the project. I will be transparent about the money received from book sales (with the only expenses being those taken by LeanPub as the publishing & sales platform) and also where I decide to invest it back into our community. It seems only fair to give back to the community that has been so generous to me over the years and also generated the content for the book.

For more details and to buy a copy, please visit https://leanpub.com/anexplorationoftesters

Solitude

One of the joys of reading is the books you come across by accident. Reading a couple of Tim Wu’s excellent books (viz. “The Attention Merchants” and “The Master Switch”) led me to books on solitude, including “Solitude: In Pursuit of a Singular Life in a Crowded World” by Michael Harris.

https://www.amazon.com/Solitude-Pursuit-Singular-Crowded-World/dp/B06ZYXJH8V

It seemed timely to read on this topic, as I’ve been implementing a “digital declutter” after recently reading Digital Minimalism by Cal Newport. I’m fortunate to live in a beautiful and peaceful location so I’m being much more mindful of making the most of the spot to deliberate separate myself from technology sometimes and take in the simple pleasures of time spent watching the ocean and listening to the birds.

The inspiration for writing “Solitude” came from the author reading about Dr Edith Bone. Hers is a remarkable story (and worth reading about in itself) of seven years spent in solitary confinement.

A little reading – and a hero in Dr. Bone – had turned malaise into a mission. I wanted to become acquainted again with the still night, with my own hapless daydreaming, with the bare self I had (for how long?) been running from. I kept asking myself: why I am so afraid of my own quiet company? This book is the closest I’ve come to an answer.

Aligning closely with Wu’s work, Harris discusses the rise of social media and the “connectedness” it was designed to create. But we all know by now that the “likes” and sharing are highly addictive, triggering small but frequent dopamine hits. This has had a devastating impact on our ability to find solitude:

We’re given opportunities to practise being alone every day, almost every hour. Go on a drive. Sit on a lawn. Stick your phone in a drawer. Once we start looking, we find solitude is always just below the surface of things. I thought at first that solitude was a lost art. Now I know that’s too pretty a term, too soft a metaphor.

Solitude has become a resource.

Like all resources, it can be harvested and hoarded, taken up by powerful forces without permission or inquiry, and then transformed into private wealth, until the fields of empty space we once took for granted first dwindle, then disappear.

Harris goes on to ask the question: what is solitude for? He comes up with three answers: the formulation of fresh ideas, self-knowledge, and (paradoxically) bonding with others.

Taken together, these three ingredients build a rich interior life. It turns out that merely escaping crowds was never the point of solitude at all: rather, solitude is a resource – an ecological niche – inside of which these benefits can be reaped. And so it matters enormously when that resource is under attack.

Our modern, hyperconnected, “always on” world sees solitude under constant threat and it takes a determined effort to find it in our lives:

Our online crowds are so insistent, so omnipresent, that we must now actively elbow out the forces that encroach on solitude’s borders, or else forfeit to them a large portion of our mental landscape.

It turns out that some research has already been done around daydreaming. MRI scanning reveals that daydreaming “constitutes an intense and heterogeneous set of brain functions” and:

…this industrious activity plays out while the conscious mind remains utterly unaware of the work – so our thoughts (sometimes really great thoughts) emerge without our anticipation or understanding. They emerge from the blue. Daydreaming thoughts may look like “pointless fantasizing” or “complex planning” or “the generation of creative ideas”. But, whatever their utility, they arrive unbidden.

Einstein believed that “the daydreaming mind’s ability to link things is, in fact, our only path toward fresh ideas.” Harris describes his own attempts to daydream during a three-hour wander and he says of this experience:

I start to see time-devouring apps like Candy Crush as pacifiers for a culture unwilling or unable to experience a finer, adult form of leisure. We believed those who told us that the devil loves idle hands. And so we gave our hands over for safekeeping. We long for constant proof of our effectiveness, our accomplishments. And perhaps it’s this longing for proof, for glittering external validation, that makes our solitude so vulnerable to those who would harvest it.

The addictive nature of social media (see ludic loops) has seen us giving up what few moments of spare time we have:

To a media baron looking for short-term profits, a daydreaming mind must look like an awful waste. All that time and attention left to wander, directionless! Making use of the blank spaces in a person’s life – draining the well of reverie – has become one of the missions of modernity.

But we do need to break out of this cycle and, bizarrely, doing so is seen as an odd and disruptive thing to do (e.g. I see the disbelief every time I mention to someone that I’m not, and never have been, “on Facebook”):

Choosing a mental solitude, then, is a disruptive act, a true sabotage of the schemes of ludic loop engineers and social media barons. Choosing solitude is a gorgeous waste.

Harris then discusses how we’ve all become part of the crowd and true marks of individualism are being eroded as a result:

…today we need to safeguard our inner weirdo, seal it off and protect it from being buffeted. Learn an old torch song that nobody knows; read a musty out-of-print detective novel; photograph a honey-perfect sunset and show it to no-one. We may need to build new and stronger weirdo cocoons, in which to entertain our private selves. Beyond the sharing, the commenting, the constant thumbs-upping, beyond all that distracting gilt, there are stranger things to be loved.

Harris explores the impact that technologies like Google Maps have had on our ability to truly lose ourselves and wander freely in nature, activities that have historically yielded great insights but are much more difficult to achieve in our hyper-connected and increasingly urban lives. He goes on to look at reading and writing – and the socialization of those activities. Proust once defined reading as “that fruitful miracle of a communication in the midst of solitude” but even this is under threat:

But that solitary reading experience is now endangered, and so is the empathy it fosters. Our stories are going social. We can assume that, in thirty years, readers and writers will use platform technologies to constantly interact with and shape each other, for better or worse. Authors will enlist crowd-sourcing and artificial intelligence to help them write their stories.

In his final chapter, Harris tells the story of his seven-day experience of solitude in a cabin in the woods, offline and alone:

Near the end of this lonely week my thoughts stop floating so much and return to the problem of solitude in a digital culture. Only now, out on the meditative trail I’ve been hiking before and after my crackers-and-apple lunch, I’m thinking about it differently, more expansively. Things here call for wide lenses.

From this dirt vantage, all that clicking and sharing and liking and posting looks like a pile of iron shackles. We are the ones creating the content, yet we’re never compensated with anything but the tremulous, fast-evaporating pleasures that social grooming delivers. Validation and self-expression, we are told, are far greater prizes than the measly cash that flows upward to platform owners…. [these] systems we live by can expropriate no value from solitude, and so they abhor it.

I enjoyed reading this book, it’s written in a very approachable style with many personal anecdotes (which you may or may not find interesting in themselves). I took this read as a reminder to make room for “daydreaming”, be that looking out over the ocean or simply not pulling out my phone during a short tram ride. Nicholas Carr says it well in the Foreword of the book:

Solitude is refreshing. It strengthens memory, sharpens awareness, and spurs creativity. It makes us calmer, more attentive, clearer headed. Most important of all, it relieves the pressure of conformity. It gives us the space we need to discover the deepest sources of passion, enjoyment, and fulfillment in our lives. Being alone frees us to be ourselves – and that makes us better company when we rejoin the crowd.

I also recently read another book on the same topic, but given a much more serious treatment by Raymond Kethledge & Mike Erwin, in the shape of “Lead Yourself First: Inspiring Leadership Through Solitude” – I highly recommend this book.

On AI

I’ve read a number of books on similar topics this year around artificial intelligence, machine learning, algorithms, etc. Coming to this topic with little in the way of prior knowledge, I feel like I’ve learned a great deal.

Our increasing reliance on decisions made my machines instead of humans is having significant – and sometimes truly frightening – consequences. Despite the supposed objectivity of algorithmic decision making, there is plenty of evidence of human biases encoded into these algorithms and the proprietary nature of some of these systems means that many are left powerless in their search for explanations about the decisions made by these algorithms.

Each of these books tackles the subject from a different perspective and I recommend them all:

It feels like “AI in testing” is becoming a thing, with my feeds populated with articles, blog posts and ads about the increasingly large role AI is playing or will play in software testing. It strikes me that we would be wise to learn from the mistakes discussed in these books in terms of trying to fully replace human decision making in testing with those made by machines. The biases encoded into these algorithms should also be acknowledged – it seems likely that confirmatory biases will be present in terms of testing and we neglect the power of human ingenuity and exploration at our peril when it comes to delivering software that both solves problems for and makes sense to (dare I say “delights”) our customers.

“Range: Why Generalists Triumph in a Specialized World” (David Epstein)

I’m a sucker for the airport bookshop and I’ve blogged before on books acquired from these venerable establishments. On a recent trip to the US, a book stood out to me as I browsed, because of its subtitle: “Why generalists triumph in a specialized world”. It immediately triggered memory of the “generalizing specialists” idea that seemed so popular in the agile community maybe ten years ago (but hasn’t been so hot recently, at least not in what I’ve been reading around agile). And so it was that Range: Why Generalists Triumph in a Specialized World by David Epstein accompanied me on my travels, giving me a fascinating read along the way.

David’s opening gambit is a comparison of the journeys of two well-known sportsmen, viz. Roger Federer and Tiger Woods. While Woods was singularly focused on becoming excellent at golf from a very young age, Federer tried many different sports before eventually becoming the best male tennis player the world has ever seen. While Woods went for early specialization, Federer opted for breadth and a range of sports before realizing where he truly wanted to specialize and excel. David notes:

The challenge we all face is how to maintain the benefits of breadth, diverse experience, interdisciplinary thinking, and delayed concentration in a world that increasingly incentivizes, even demands, hyperspecialization. While it is undoubtedly true that there are areas that require individuals with Tiger’s precocity and clarity of purpose, as complexity increases – and technology spins the world into vaster webs of interconnected systems in which each individual sees only a small part – we also need more Rogers: people who start broad and embrace diverse experiences and perspectives while they progress. People with range.

Chapter 1 – “The Cult of the Head Start” – uses the example of chess grand masters, similarly to golf, where early specialization works well. David makes an interesting observation here around AI, a topic which seems to be finding its way into more and more conversations in software testing, and the last line of this quote from this chapter applies well to the very real challenges involved in thinking about AI as a replacement for human testers in my opinion:

The progress of AI in the closed and orderly world of chess, with instant feedback and bottomless data, has been exponential. In the rule-bound but messier world of driving, AI has made tremendous progress, but challenges remain. In a truly open-world problem devoid of rigid rules and reams of perfect historical data, AI has been disastrous. IBM’s Watson destroyed at Jeopardy! and was subsequently pitched as a revolution in cancer care, where it flopped so spectacularly that several AI experts told me they worried its reputation would taint AI research in health-related fields. As one oncologist put it, “The difference between winning at Jeopardy! and curing all cancer is that we know the answer to Jeopardy! questions.” With cancer, we’re still working on posing the right questions in the first place.

In Chapter 2 – “How the Wicked World Was Made” – David shares some interesting stories around IQ testing and notes that:

…society, and particularly higher education, has responded to the broadening of the mind by pushing specialization, rather than focusing early training on conceptual, transferable knowledge.

I see the same pattern in software testing, with people choosing to specialize in one particular automation tool over learning more broadly about good testing, risk analysis, critical thinking and so on, skills that could be applied more generally (and are also less prone to redundancy as technology changes). In closing out the chapter, David makes the following observation which again rings very true in testing:

The more constrained and repetitive a challenge, the more likely it will be automated, while great rewards will accrue to those who can take conceptual knowledge from one problem or domain and apply it in an entirely new one.

A fascinating – and new to me – story about early Venetian music opens Chapter 3 – “When Less of the Same Is More”. In discussing how musicians learn and apply across genres, his conclusion again makes for poignant reading for testers especially those with a desire to become excellent exploratory testers:

[This] is in line with a classic research finding that is not specific to music: breadth of training predicts breadth of transfer. That is, the more contexts in which something is learned, the more the learner creates abstract models, and the less they rely on any particular example. Learners become better at applying their knowledge to a situation they’ve never seen before, which is the essence of creativity.

Chapter 4’s title is a nod to Daniel Kahneman, “Learning, Fast and Slow”, and looks at the difficulty of teaching and training to make it more broadly applicable than the case directly under instruction, using examples from maths students and naval officers:

Knowledge with enduring utility must be very flexible, composed of mental schemes that can be matched to new problems. The virtual naval officers in the air defense simulation and the math students who engaged in interleaved practice were learning to recognize deep structural similarities in types of problems. They could not rely on the same type of problem repeating, so they had to identify underlying conceptual connections in simulated battle threats, or math problems, that had never actually been seen before. They then matched a strategy to each new problem. When a knowledge structure is so flexible that it can be applied effectively even in new domains or extremely novel situations, it is called “far transfer.”

I think we face similar challenges in software testing. We’re usually testing something different from what we’ve tested before, we’re generally not testing the same thing over and over again (hopefully). Thinking about how we’ve faced similar testing challenges in the past and applying appropriate learnings from those to new testing situations is a key skill and helps us to develop our toolbox of ideas, strategies and other tools from which to draw when faced with a new situation. This “range” and ability to make conceptual connections is also very important in performing good risk analysis, another key testing skill.

In Chapter 5 – “Thinking Outside Experience” – David tells the story of Kepler and how he drew new information about astronomy by using analogies from very disparate areas, leading to his invention of astrophysics. He was a fastidious note taker too, just like a good tester:

Before he began his tortuous march of analogies toward reimagining the universe, Kepler had to get very confused on his homework. Unlike Galileo and Isaac Newton, he documented his confusion. “What matters to me,” Kepler wrote, “is not merely to impart to the reader what I have to say, but above all to convey to him the reasons, subterfuges, and lucky hazards which led to my discoveries.”

Chapter 6 – “The Trouble with Too Much Grit” – starts by telling the story of Van Gogh, noting:

It would be easy enough to cherry-pick stories of exceptional late developers overcoming the odds. But they aren’t exceptions by virtue of their late starts, and those late starts did not stack the odds against them. Their late starts were integral to their eventual success.

David also shares a story about a major retention issue experienced by a select part of the US Army, concluding:

In the industrial era, or the “company man era”…”firms were highly specialized,” with employees generally tackling the same suite of challenges repeatedly. Both the culture of the time – pensions were pervasive and job switching might be viewed as disloyal – and specialization were barriers to worker mobility outside of the company. Plus, there was little incentive for companies to recruit from outside when employees regularly faced kind learning environments, the type where repetitive experience alone leads to improvement. By the 1980s, corporate culture was changing. The knowledge economy created “overwhelming demand for…employees with talents for conceptualization and knowledge creation.” Broad conceptual skills now helped in an array of jobs, and suddenly control over career trajectory shifted from the employer, who looked inward at a ladder of opportunity, to the employee, who peered out at a vast web of possibility. In the private sector, an efficient talent market rapidly emerged as workers shuffled around in pursuit of match quality. [the degree of fit between the work someone does and who they are – their abilities and proclivities] While the world changed, the Army stuck with the industrial-era ladder.

In Chapter 7 – “Flirting with Your Possible Selves” – David shares the amazing career story of Frances Hesselbein as an example of changing tack many times rather than choosing an early specialization and sticking with it, and the many successes it can yield along the journey. He cites:

[computational neuroscientist Ogo Ogas] uses the shorthand “standardization covenant” for the cultural notion that it is rational to trade a winding path of self-exploration for a rigid goal with a head start because it ensures stability. “The people we study who are fulfilled do pursue a long-term goal, but they only formulate it after a period of discovery,” he told me. “Obviously, there’s nothing wrong with getting a law or medical degree or PhD. But it’s actually riskier to make that commitment before you know how it fits you. And don’t consider the path fixed. People realize things about themselves halfway through medical school.” Charles Darwin for example.

Chapter 8 – “The Outsider Advantage” – talks about the benefits of bringing diverse skills and experiences to bear in problem solving:

[Alph] Bingham had noticed that established companies tended to approach problems with so-called local search, that is, using specialists from a single domain, and trying solutions that worked before. Meanwhile, his invitation to outsiders worked so well that it was spun off as an entirely different company. Named InnoCentive, it facilitates entities in any field acting as “seekers” paying to post “challenges” and rewards for outside “solvers.” A little more than one-third of challenges were completely solved, a remarkable portion given that InnoCentive selected for problems that had stumped the specialists who posted them. Along the way, InnoCentive realized it could help seekers tailor their posts to make a solution more likely. The trick: to frame the challenge so that it attracted a diverse array of solvers. The more likely a challenge was to appeal not just to scientists but also to attorneys and dentists and mechanics, the more likely it was to be solved.

Bingham calls it “outside-in” thinking: finding solutions in experiences far outside of focused training for the problem itself. History is littered with world-changing examples.

This sounds like the overused “think outside the box” concept, but there’s a lot of validity here, the fact is that InnoCentive works:

…as specialists become more narrowly focused, “the box” is more like Russian nesting dolls. Specialists divide into subspecialties, which soon divide into sub-subspecialties. Even if they get outside the small doll, they may get stuck inside the next, slightly larger one. 

In Chapter 9 – “Lateral Thinking with Withered Technology” – David tells the fascinating story of Nintendo and how the Game Boy was such a huge success although built using older (“withered”) technology. Out of this story, he mentions the idea of “frogs” and “birds” from physicist and mathematician, Freeman Dyson:

…Dyson styled it this way: we need both focused frogs and visionary birds. “Birds fly high in the air and survey broad vistas of mathematics out to the far horizon,” Dyson wrote in 2009. “They delight in concepts that unify our thinking and bring together diverse problems from different parts of the landscape. Frogs live in the mud below and only see the flowers that grow nearby. They delight in the details of particular objects, and they solve problems one at a time.” As a mathematician, Dyson labeled himself a frog but contended, “It is stupid to claim that birds are better than frogs because they see farther, or that frogs are better than birds because they see deeper.” The world, he wrote, is both broad and deep. “We need birds and frogs working together to explore it.” Dyson’s concern was that science is increasingly overflowing with frogs, trained only in a narrow specialty and unable to change as science itself does. “This is a hazardous situation,” he warned, “for the young people and also for the future of science.”

I like this frog and bird analogy and can picture examples from working with teams where excellent testing arose from a combination of frogs and birds working together to produce the kind of product information neither would have provided alone.

David makes the observation that communication technology and our increasingly easy access to vast amounts of information is also playing a part in reducing our need for specialists:

…narrowly focused specialists in technical fields…are still absolutely critical, it’s just that their work is widely accessible, so fewer suffice

An interesting study on patents further reinforces the benefits of “range”:

In low-uncertainty domains, teams of specialists were more likely to author useful patents. In high-uncertainty domains – where the fruitful questions themselves were less obvious – teams that included individuals who had worked on a wide variety of technologies were more likely to make a splash. The higher the domain uncertainty, the more important it was to have a high-breadth team member… When the going got uncertain, breadth made the difference.

In Chapter 10 – “Fooled by Expertise” – David looks at how poorly “experts” are able to predict the future and talks about work from psychologist and political scientist, Philip Tetlock:

Tetlock conferred nicknames …that became famous throughout the psychology and intelligence-gathering communities: the narrow view hedgehogs, who “know one big thing” and the integrator foxes, who “know many little things.”

Hedgehog experts were deep but narrow. Some had spent their careers studying a single problem. Like [Paul[ Ehrlich and [Julian] Simon, they fashioned tidy theories of how the world works through the single lens of their specialty, and then bent every event to fit them. The hedgehogs, according to Tetlock, “toil devotedly” within one tradition of their specialty, “and reach for formulaic solutions to ill-defined problems.” Outcomes did not matter; they were proven right by both successes and failures, and burrowed further into their ideas. It made them outstanding at predicting the past, but dart-throwing chimps at predicting the future. The foxes, meanwhile, “draw from an eclectic array of traditions, and accept ambiguity and contradiction,” Tetlock wrote. Where hedgehogs represented narrowness, foxes ranged outside a single discipline or theory and embodied breadth.

David’s observations on this later in the chapter reminded me of some testers I’ve worked with over the years who are unwilling to see beyond the binary “pass” or “fail” outcome of a test:

Beneath complexity, hedgehogs tend to see simple, deterministic rules of cause and effect framed by their area of expertise, like repeating patterns on a chessboard. Foxes see complexity in what others mistake for simple cause and effect. They understand that most cause and effect relationships are probabilistic, not deterministic. There are unknowns, and luck, and even when history apparently repeats, it does not do so precisely. They recognize that they are operating in the very definition of a wicked learning environment, where it can be very hard to learn, from either wins or losses.

Chapter 11 – “Learning to Drop Your Familiar Tools” – starts off by telling the story of the Challenger space shuttle disaster and how, even though some people knew about the potential for the problem that caused the disaster, existing practices and culture within NASA got in the way of that knowledge being heard. The “Carter Racing” Harvard Business School case study mimics the Challenger disaster but the participants have to make a race/no race decision on whether to run a racing car with some known potential problems. Part of this story reminded very much of the infamous Dice Game so favoured by the context-driven testing community:

“Okay…here comes a quantitative question,” the professor says. “How many times did I say yesterday if you want additional information let me know?” Muffled gasps spread across the room. “Four times,” the professor answers himself. “Four times I said if you want additional information let me know.” Not one student asked for the missing data [they needed to make a good decision].

A fascinating story about the behaviour of firefighters in bushfire situations was very revealing, with many of those who perish being found weighed down with heavy equipment when they could have ditched their tools and probably run to safety:

Rather than adapting to unfamiliar situations, whether airline accidents or fire tragedies, [pyschologist and organizational behaviour expert Karl] Weick saw that experienced groups became rigid under pressure and “regress to what they know best.” They behaved like a collective hedgehog, bending an unfamiliar situation to a familiar comfort zone, as if trying to will it to become something they had actually experienced before. For wildland firefighters, their tools are what they know best. “Firefighting tools define the firefighter’s group membership, they are the firefighter’s reason for being deployed in the first place,” Weick wrote. “Given the central role of tools in defining the essence of a firefighter, it’s not surprising that dropping one’s tools creates an existential crisis.” As Maclean succinctly put it, “When a firefighter is told to drop his firefighting tools, he is told to forget he is a firefighter.”

This reminded me of some testers who hang on to test management tools or a particular automation tool as though it defines them and their work. We should be thinking more broadly and using tools to aid us, not define us:

There are fundamentals – scales and chords – that every member must overlearn, but those are just tools for sensemaking in a dynamic environment. There are no tools that cannot be dropped, reimagined, or repurposed in order to navigate an unfamiliar challenge. Even the most sacred tools. Even the tools so taken for granted they become invisible.

Chapter 12 – “Deliberate Amateurs” – wraps up the main content of the book. I love this idea:

They [amateurs] embrace what Max Delbruck, a Nobel laureate who studied the intersection of physics and biology, called “the principle of limited sloppiness.” Be careful not to be too careful, Delbruck warned, or you will unconsciously limit your exploration.

This note on the global financial crisis rings true in testing also, all too often we see testing compartmentalized and systemic issues go undetected:

While I was researching this book, an official with the US Securities and Exchange Commission learned I was writing about specialization and contacted me to make sure I knew that specialization had played a critical role in the 2008 global financial crisis. “Insurance regulators regulated insurance, bank regulators regulated banks, securities regulators regulated securities, and consumer regulators regulated consumers,” the official told me. “But the provision of credit goes across all those markets. So we specialized products, we specialized regulation, and the question is, ‘Who looks across those markets?’ The specialized approach to regulation missed systemic issues.”

We can also learn something from this observation about team structures, especially in the world of microservices and so on:

In professional networks that acted as fertile soil for successful groups, individuals moved easily between teams, crossing organizational and disciplinary boundaries and finding new collaborators. Networks that spawned unsuccessful teams, conversely, were broken into small, isolated clusters in which the same people collaborated over and over. Efficient and comfortable, perhaps, but apparently not a creative engine.

In his Conclusion, David offers some good advice:

Approach your personal voyage and projects like Michelangelo approached a block of marble, willing to learn and adjust as you go, and even to abandon a previous goal and change directions entirely should the need arise. Research on creators in domains from technological innovation to comic books shows that that a diverse group of specialists cannot fully replace the contributions of broad individuals. Even when you move on from an area of work or an entire domain, that experience is not wasted.

Finally, remember that there is nothing inherently wrong with specialization. We all specialize to one degree or another, at some point or other.

I thoroughly enjoyed reading “Range”. David’s easy writing style illustrated his points with good stories and examples, making this a very accessible and comprehensible book. There were many connections to what we see in the world of software testing, hopefully I’ve managed to illuminate some of these in this post.

This is recommended reading for anyone involved in technology and testers in particular I think will gain a lot of insights from reading this book. And, remember, “Be careful not to be too careful”!

“Essentialism: The Disciplined Pursuit of Less” (Greg McKeown)

After seeing several recommendations for the book Essentialism: The Disciplined Pursuit of Less, I borrowed a copy from the Melbourne Library Service recently – and then read the book from cover-to-cover over only a couple of sittings. This is a sign of how much I enjoyed reading it and the messages in the book resonated strongly with me, on both a personal and professional level. The parallels between what Greg McKeown writes about here and the Agile movement in software development are also (perhaps surprisingly) strong and this helped make the book even more contextually significant for me.

The fundamental idea here is “Less but better.”

The way of the Essentialist is the relentless pursuit of less but better… Essentialism is not about how to get more things done; it’s about how to get the right things done. It doesn’t mean just doing less for the sake of less either. It is about making the wisest possible investment of your time and energy in order to operate at your highest point of contribution by doing only what is essential.

Greg argues that we have forgotten our ability to choose and feel compelled to “do it all” and say yes to everything:

The ability to choose cannot be taken away or even given away – it can only be forgotten… When we forget our ability to choose, we learn to be helpless. Drip by drip we allow our power to be taken away until we end up becoming a function of other people’s choices – or even a function of our own past choices.

It’s all too easy in our busy, hyper-connected lives to think almost everything is essential and that the opportunities that come our way are almost equal. But the Essentialist thinks almost everything is non-essential and “distinguishes the vital few from the trivial many.”

Greg makes an important point about trade-offs, something again that it’s all too easy to forget and instead over-commit and try to do everything asked of us or take on all the opportunities coming our way:

Essentialists see trade-offs as an inherent part of life, not as an inherently negative part of life. Instead of asking “What do I have to give up?”, they ask “What do I want to go big on?” The cumulative impact of this small change in thinking can be profound.

The trap of “busyness” leads us to not spend the time we should reflecting on what’s really important.

Essentialists spend as much time as possible exploring, listening, debating, questioning, and thinking. But their exploration is not an end in itself. The purpose of their exploration is to discern the vital few from the trivial many.

The topic of sleep comes next and this seems to be a hot topic right now. A non-Essentialist thinks “One hour less of sleep equals one more hour of productivity” while the Essentialist thinks “One more hour of sleep equals several more hours of  much higher productivity.” This protection of the asset that is sleep is increasingly being demonstrated as important, not only for productivity but also for mental health.

Our highest priority is to protect our ability to prioritize.

Prioritizing which opportunities to take on is a challenge for many of us, I’ve certainly taken on too much at times. Greg’s advice when selecting opportunities is simple:

If it isn’t a clear yes, then it’s a clear no

Of course, actually saying “no” can be difficult and a non-Essentialist will avoid doing so  to avoid feeling social awkwardness and pressure, instead saying “yes” to everything. An Essentialist, meanwhile, “dares to say no firmly, resolutely and gracefully and says “yes” only to things that really matter.” This feels like great advice and thankfully Greg offers a few tips for how to say “no” gracefully:

  • Separate the decision from the relationship
  • Saying “no” gracefully doesn’t have to mean using the word no
  • Focus on the trade-off
  • Remind yourself that everyone is selling something
  • Make your peace with the fact that saying “no” often requires trading popularity for respect
  • Remember that a clear “no” can be more graceful than a vague  or non-committal “yes”

The section on subtracting (removing obstacles to bring forth more) resonated strongly with my experiences in software development:

Essentialists don’t default to Band-Aid solutions. Instead of looking for the most obvious or immediate obstacles, they look for the ones slowing down progress. They ask “What is getting in the way of achieving what is essential?” While the non-Essentialist is busy applying more and more pressure and piling on more and more solutions, the Essentialist simply makes a one-time investment in removing obstacles. This approach goes beyond just solving problems, it’s a method of reducing your efforts to maximize your results.

Similarly when looking at progress, there are obvious similarities with the way agile teams think and work:

A non-Essentialist starts with a big goal and gets small results and they go for the flashiest wins. An Essentialist starts small and gets big results and they celebrate small acts of progress.

The benefits of routine are also highlighted, for “without routine, the pull of non-essential distractions will overpower us” and I see the value in the routines of Scrum, for example, as a way of keeping distractions at bay and helping team execution appear more effortless.

This relatively short book is packed with great stories and useful takeaways. As we all lead more connected and busy lives where the division between work and not-work has become so blurred for so many of us, the ideas in this book are practical ways to help focus on what really matters. I’m certainly motivated to now focus more on a smaller number of projects especially outside of work, a decision I’d already taken before reading this book but reading it also validated that decision as well as providing me with good ways of dealing with whatever opportunities may arise and truly prioritizing the ones that matter.