While writing my last blog post, a review of Cal Newport’s “Deep Work” book, I reminded myself of a topic I’ve been meaning to blog about for a while, viz. the power of the pause.
Coming at this from a software development perspective, I mentioned in the last blog post that:
“There seems to be a new trend forming around “deployments to production” as being a useful measure of productivity, when really it’s more an indicator of busyness and often comes as a result of a lack of appetite for any type of pause along the pipeline for humans to meaningfully (and deeply!) interact with the software before it’s deployed.”
I often see this goal of deploying every change directly (and automatically) to production without the goal being accompanied by compelling reasons for doing so – apart from maybe “it’s what <insert big name tech company here> does”, even though you’re likely nothing like those companies in most other important ways. What’s the rush? While there are some cases where a very quick deployment to production is of course important, the idea that every change needs to be deployed in the same way is questionable for most organizations I’ve worked with.
Automated deployment pipelines can be great mechanisms for de-risking the process of getting updated software into production, removing opportunities for human error and making such deployments less of a drama when they’re required. But, just because you have this mechanism at your disposal, it doesn’t mean you need to use it for each and every change made to the software.
I’ve seen a lot of power in pausing along the deployment pipeline to give humans the opportunity to interact with the software before customers are exposed to the changes. I don’t believe we can automate our way out of the need for human interaction for software designed for use by humans, but I’m also coming to appreciate that this is increasingly seen as a contrarian position (and one I’m happy to hold). I’d ask you to consider whether there is a genuine need for automated deployment of every change to production in your organization and whether you’re removing the opportunity to find important problems by removing humans from the process.
Taking a completely different perspective, I’ve been practicing mindfulness meditation for a while now and haven’t missed a daily practice since finishing up full-time employment back in August 2020. One of the most valuable things I’ve learned from this practice is the idea of putting space between stimulus and response – being deliberate in taking pause.
Exploring the work of Gerry Hussey has been very helpful in this regard and he says:
The things and situations that we encounter in our outer world are the stimulus, and the way in which we interpret and respond mentally and emotionally to that stimulus is our response.
Consciousness enables us to create a gap between stimulus and response, and when we expand that gap, we are no longer operating as conditioned reflexes. By creating a gap between stimulus and response, we create an opportunity to choose our response. It is in this gap between stimulus and response that our ability to grow and develop exists. The more we expand this gap, the less we are conditioned by reflexes and the more we grow our ability to be defined not by happens to us but how we choose to respond.
Awaken Your Power Within: Let Go of Fear. Discover Your Infinite Potential. Become Your True Self (Gerry Hussey)
I’ve found this idea really helpful in both my professional and personal lives. It’s helped with listening, to focus on understanding rather than an eagerness to simply respond. The power of the pause in this sense has been especially helpful in my consulting work as it has a great side effect of lowering the chances of jumping into solution mode before fully understanding the problem at hand. Accepting the fact that things will happen outside my control in my day to day life but that I have the choice in how to respond to whatever happens has been transformational.
Inevitably, there are still times where my response to stimuli is quick, conditioned and primitive (with system 1 thinking doing its job) – and sometimes not kind. But I now at least recognize when this has happened and bring myself back to what I’ve learned from regular practice so as to continue improving.
So, whether it’s thinking specifically about software delivery pipelines or my interactions with the world around me, I’m seeing great power in the pause – and maybe you can too.
It was thanks to a recommendation from Michael Bolton that I came across the book Calling Bullsh*t by Carl T. Bergstrom and Jevin D. West. While it’s not a book specifically about software testing, there are some excellent takeaways for testers as I’ll point to in the following review of the book. This book is a must read for software testers in my opinion.
The authors’ definition of bullshit (BS) is important to note before digging into the content (appearing on page 40):
Bullshit involves language, statistical figures, data graphics, and other forms of presentation intended to persuade or impress an audience by distracting, overwhelming, or intimidating them with a blatant disregard for truth, logical coherence, or what information is actually being conveyed.
I was amazed to read that the authors already run a course at a US university on the same topic as this book:
We have devoted our careers to teaching students how to think logically and quantitatively about data. This book emerged from a course we teach at the University of Washington, also titled “Calling Bullshit”. We hope it will show you that you do not need to be a professional statistician or econometrician or data scientist to think critically about quantitative arguments, nor do you need extensive data sets and weeks of effort to see through bullshit. It is often sufficient to apply basic logical reasoning to a problem and, where needed, augment that with information readily discovered via search engine.
The rise of the internet and particularly social media are noted as ways that BS has proliferated in more recent times, spreading both misinformation (claims that are false but not deliberately designed to deceive) and disinformation (deliberate falsehoods).
…the algorithms driving social media content are bullshitters. They don’t care about the messages they carry. They just want our attention and will tell us whatever works to capture it.
Bullshit spreads more easily in a massively networked, click-driven social media world than in any previous social environment. We have to be alert for bullshit in everything we read.
As testers, we tend to have a critical thinking mindset and are hopefully alert to stuff that just doesn’t seem right, whether that’s the way a feature works in a product or a claim made about some software. It seems to me that testers should naturally be good spotters of BS more generally and this book provides a lot of great tips both for spotting BS and learning how to credibly refute it.
Looking at black boxes (e.g. statistical procedures or data science algorithms), the authors make the crucial point that understanding the inner workings of the black box is not required in order to spot problems:
The central theme of this book is that you usually don’t have to open the analytic black box in order to call bullshit on the claims that come out of it. Any black box used to generate bullshit has to take in data and spit results out.
Most often, bullshit arises either because there are biases in the data that get fed into the black box, or because there are obvious problems with the results that come out. Occasionally the technical details of the black box matter, but in our experience such cases are uncommon. This is fortunate, because you don’t need a lot of technical expertise to spot problems with the data or results. You just need to think clearly and practice spotting the sort of thing that can go wrong.
The first big topic of consideration looks at associations, correlations and causes and spotting claims that confuse one for the other. The authors provide excellent examples in this chapter of the book and a common instance of this confusion in the testing arena is covered by Theresa Neate‘s blog post, Testing and Quality: Correlation does not equal Causation. (I’ve also noted the confusion between correlation and causality very frequently when looking at big ag-funded “studies” used as ammunition against veganism.)
The chapter titled “Numbers and Nonsense” covers the various ways in which numbers are used in misleading and confusing ways. The authors make the valid point that:
…although numbers may seem to be pure facts that exist independently from any human judgment, they are heavily laden with context and shaped by decisions – from how they are calculated to the units in which they are expressed.
It is all too common in the testing industry for people to hang numbers on things that make little or no sense to look at quantitatively, counting “test cases” comes to mind. The book covers various ways in which numbers turn into nonsense, including summary statistics, percentages and percentage points. Goodhart’s Law is mentioned (in its rephrased form by Marilyn Strathern):
When a measure becomes a target, it ceases to be a good measure
I’m sure many of us are familiar with this law in action when we’re forced into “metrics programmes” around testing for which gaming becomes the focus rather than the improvement our organizations were looking for. The authors introduce the idea of mathiness here: “mathiness refers to formulas and expressions that may look and feel like math – even as they disregard the logical coherence and formal rigour of actual mathematics” and testing is not immune from mathiness either, e.g. “Tested = Checked + Explored” is commonly quoted from Elisabeth Hendrickson‘s (excellent) Explore It! book. Another concept that will be very familiar to testers (and others in the IT industry) is zombie statistics, viz.
…numbers that are cited badly out of context, are sorely outdated, or were entirely made up in the first place – but they are quoted so often that they simply won’t die.
There are many examples of such zombie statistics in our industry, Boehm’s so-called cost of change curve being a prime example (claiming that the cost of changes later in the development cycle is orders of magnitude higher than earlier in the cycle) and is of one the examples covered beautifully in Laurent Bossavit’s excellent book, The Leprechauns of Software Engineering.
The next statistical concept introduced in the book is selection bias and I was less familiar with this concept (at least under this name):
Selection bias arises when the individuals that you sample for your study differ systematically from the population of individuals eligible for your study.
This sort of non-random sampling leads to statistical analyses failing or becoming misleading and there are again some well-considered examples to explain and illustrate this bias. Reading this chapter brought to mind my recent critique of the Capgemini World Quality Report in which I noted that both the size of organizations and roles of participants in the survey was problematic. (I again note that from my vegan research that many big ag-funded studies suffer from this bias too.)
A hefty chapter is devoted to data visualization, with the authors noting the relatively recent proliferation of charts and data graphics in the media due to the technology becoming available to more easily produce them. The treatment of the various ways that charts can be misleading is again excellent with sound examples (including axis scaling, axis starting values, and the “binning” of axis values). I loved the idea of glass slippers here, viz.
Glass slippers take one type of data and shoehorn it into a visual form designed to display another. In doing so, they trade on the authority of good visualizations to appear authoritative themselves. They are to data visualizations what mathiness is to mathematical equations.
The misuse of the periodic table visualization is cited as an example and, of course, the testing industry has its own glass slippers in this area, for example Santhosh Tuppad’s Heuristic Table of Testing! This chapter also discusses visualizations that look like Venn diagrams but aren’t, and highlights the dangers of 3-D bar graphs, line graphs and pie charts. A new concept for me in this chapter was the principle of proportional ink:
Edward Tufte…in his classic book The Visual Display of Quantitative Information…states that “the representation of numbers, as physically measured on the surface of the graphic itself, should be directly proportional to the numerical quantities represented.” The principle of proportional ink applies this rule to how shading is used on graphs.
The illustration of this principle by well-chosen examples is again very effective here.
It’s great to see some sensible commentary on the subject of big data in the next chapter. The authors say “We want to provide an antidote to [the] hype” and they certainly achieve this aim. They discuss AI & ML and the critical topic of how training data influences outcomes. They also note how machine learning algorithms perpetuate human biases.
The problem is the hype, the notion that something magical will emerge if only we can accumulate data on a large enough scale. We just need to be reminded: Big data is not better; it’s just bigger. And it certainly doesn’t speak for itself.
The topics of Big Data, AI and ML are certainly hot in the testing industry at the moment, with tool vendors and big consultancies all extoling the virtues of these technologies to change the world of testing. These claims have been made for quite some time now and, as I noted in my critique of the Capgemini World Quality Report recently, the reality has yet to catch up with the hype. I commend the authors here for their reality check in this over-hyped area.
In the chapter titled “The Susceptibility of Science”, the authors discuss the scientific method and how statistical significance (p-values) is often manipulated to aid with getting research papers published in journals. Their explanation of the base rate fallacy is excellent and a worthy inclusion, as it is such a common mistake. While the publication of dodgy papers and misleading statistics are acknowledged, the authors’ belief is that “science just plain works” – and I agree with them. (From my experience in vegan research, I’ve read so many dubious studies funded by big ag but these don’t undermine my faith in science, rather my faith in human nature sometimes!) In closing:
Empirically, science is successful. Individual papers may be wrong and individual studies misreported in the popular press, but the institution as a whole is strong. We should keep this in perspective when we compare science to much of the other human knowledge – and human bullshit – that is out there.
In the penultimate chapter, “Spotting Bullshit”, the discussion of the various means by which BS arises (covered throughout the book) is split out into six ways of spotting it, viz.
Question the source of information
Beware of unfair comparisons
If it seems too good or bad to be true…
Think in orders of magnitude
Avoid confirmation bias
Consider multiple hypotheses
These ways of spotting BS act as a handy checklist I think and will certainly be helpful to me in refining my skills in this area. While I was still reading this book, I listened to a testing panel session online and one of the panelists was from testing tool vendor, Applitools. He briefly mentioned some claims about their visual AI-powered test automation tool. These claims piqued my interest and I managed to find the same statistics on their website:
I’ll leave it as an exercise for the reader to decide if any of the above falls under the various ways BS manifests itself according to this book!
The final chapter, “Refuting Bullshit”, is really a call to action:
…a solution to the ongoing bullshit epidemic is going to require more than just an ability to see it for what it is. We need to shine a light on bullshit where it occurs, and demand better from those who promulgate it.
The authors provide some methods to refute BS, as they themselves use throughout the book in the many well-chosen examples used to illustrate their points:
Use reductio ad absurdum
Deploy a null model
They also “conclude with a few thoughts about how to [call BS] in an ethical and constructive manner”, viz.
In summary, this book is highly recommended reading for all testers to help them become more skilled spotters of BS; be that from vendors, testing consultants or others presenting information about testing. This skill will also come in very handy in spotting BS in claims made about the products you work on in your own organization!
The amount of energy needed to refute bullshit is an order of magnitude bigger than [that needed] to produce it.
Alberto Brandolini (Italian software engineer, 2014)
After reading this book, you should have the skills to spot BS and I actively encourage you to then find inventive ways to refute it publicly so that others might not get fooled by the same BS.
The second of my recent airport bookshop purchases has made for an excellent read over the last week. In Mindset: The New Psychology of Success, Carol Dweck explores the influence that the way we think about our talents and abilities has on our success.
The key concept here is that of the “growth mindset”, as compared to a “fixed mindset”. A fixed mindset is “believing that your qualities are carved in stone… [and] creates an urgency to prove yourself over and over” while a growth mindset is “based on the belief that your basic qualities are things you can cultivate through your efforts, your strategies, and help from others…. [and] everyone can change and grow through application and experience”. She notes that all of us have elements of both mindsets.
She notes that a “fixed mindset makes people into non-learners” and, while failure still can be painful with a growth mindset, “it doesn’t define you; it’s a problem to be faced, dealt with and learned from”. I liked this question as a way to think about growth vs. fixed mindsets: “Is success about learning – or proving you’re smart?”
The role of effort in the growth mindset is highlighted throughout the book:
The fixed mindset limits achievement. It fills people’s minds with interfering thoughts, it makes effort disagreeable, and it leads to inferior learning strategies. What’s more, it makes other people into judges instead of allies. Whether we’re talking about Darwin or college students, important achievements require clear focus, all-out effort, and a bottomless trunk full of strategies. Plus allies in learning. This is what the growth mindset gives people, and that’s why it helps their abilities grow and bear fruit.
It’s an interesting section of the book when Carol moves away from the individual to the organizational level:
Clearly the leader of an organization can hold a fixed or growth mindset, but can an organization as whole have a mindset? Can it have a pervasive belief that talent is just fixed or, instead, a pervasive belief that talent can be and should be developed in all employees? And, if so, what impact will this have on the organization and its employees?
She goes on to cite some research in this area:
People who work in growth-mindset organizations have far more trust in their company and a much greater sense of empowerment, ownership, and commitment… Those who worked in fixed-mindset companies, however, expressed greater interest in leaving their company for another… employees in the growth-mindset companies say that their organization supports (reasonable) risk-taking, innovation, and creativity… Employees in the fixed-mindset companies not only say that their companies are less likely to support them in risk-taking and innovation, they are also far more likely to agree that their organizations are rife with cutthroat or unethical behavior.
A reference to an excellent diagram by Nigel Holmes is a handy summary of the messages in this book:
This is a book with great messages and is of broad interest. Carol cites lots of research to back her claims and the book is made very readable thanks to the excellent examples from the business world, school, and sport.
Thinking about the book’s key message around the growth mindset in the context of software testing, it strikes me that much of the testing industry is actually stuck in a fixed mindset and the benefits of continuous learning and growth are not as valued as they could be. The idea of certifications in testing doesn’t help with this (although you could argue there is learning involved in attaining them), especially when you can take an exam to become an “Expert” level tester.
It’s personally very rewarding to be active in a part of the testing community that doesgenuinely value learning and growth (that’s the context-driven testing community) and where having “a bottomless trunk full of strategies [and] allies in learning” are the norm.
This lovely little piece of mis-translation came through my Twitter feed over the weekend (originally from Sheenagh Pugh):
I am reliably informed by team mates in our Zhuhai office that a better translation would read “Keep off the grass”, but the wording as it is makes for a much nicer message I think. (For a much more in-depth look at this translation problem, check out the post on this very topic on the Language Log.)
This got me thinking about the way we express ourselves as testers. We’re often the bringers of bad news and choosing how to express that information to our stakeholders can make a huge difference to how both the individual tester and the profession of testing is perceived.
I’ve been reading recently about interactional expertise (Collins and Evans) and emotional intelligence and I think these are subjects that testers need to be familiar with to help them interact with their varied stakeholders in more effective ways. While writing good defect reports is still an essential (and overlooked) skill, the ability to communicate with stakeholders more generally is becoming more and more important especially in agile teams, so I’m sure that developing these skills will elevate testers within their teams and help to make them the valued team members they really should be.
(And, while I’m here, I strongly recommend that you grab yourself a copy of the latest Status Quo album, called “Aquostic“. This is the band’s first all acoustic effort and has just charted in the UK at number 5, not bad for a band in its sixth decade!)
The Stanford “marshmallow experiment” was a series of studies on delayed gratification led by psychologist Walter Mischel, in the late ’60s and early ’70s. A recent revisiting of this experiment can be seen in this short video and, whether you believe in the claims of follow-up studies or not, it’s interesting to watch the different behaviours of different kids (even the difference between two twin boys in this regard).
I was working on a presentation about thinking of testing as being an information service provider when this video came to my attention (thanks to my wife). It got me thinking about the people who make release decisions for the software we work on.
We can provide one marshmallows worth of valuable product information right now and you can release based on that information. Or, we can spend some more time doing really good testing and then give you two marshmallows worth of information, to make an even more informed release decision!
The problem with this analogy is that neither marshmallow necessarily tastes very good – and the second one is likely to taste worse than the first, right?!
Anyway, maybe just give your business folks a marshmallow anyway, it might just make their day.