2021 in review

As another year draws to a close, I’ll take the opportunity to review my 2021.

I published 14 blog posts during the year, just about meeting my personal target cadence of a post every month. I wrapped up my ten-part series answering common search engine questions about testing and covered several different topics during my blogging through the year. My blog attracted about 25% more views than in 2020, somewhat surprisingly, and I continue to be really grateful for the amplification of my blog posts via their regular inclusion in lists such as 5Blogs, Testing Curator’s Testing Bits and Software Testing Weekly.

December 2021 has been the biggest month for my blog by far this year with a similar number of views to my all-time high back in November 2020 – interestingly, I published a critique of an industry report in December and published similar critiques in November 2020, so clearly these types of posts are popular (even if they can be somewhat demoralizing to write)!

I closed out the year with about 1,200 followers on Twitter, again up around 10% over the year.

Conferences and meetups

2021 was my quietest year for perhaps fifteen years in terms of conferences and meetups, mainly due to the ongoing impacts of the COVID-19 pandemic around the world.

I was pleased to announce mid-2021 that I would be speaking at the in-person Testing Talks 2021 (The Reunion) conference in Melbourne in October. Sadly, the continuing harsh response to the pandemic in this part of the world made an in-person event too difficult to hold, but hopefully I can keep that commitment for its rescheduled date in 2022.

I didn’t participate in any virtual or remote events during the entire year.

Consulting

After launching my testing consultancy, Dr Lee Consulting, towards the end of 2020, I noted in last year’s review post that “I’m confident that my approach, skills and experience will find a home with the right organisations in the months and years ahead.” This confidence turned out to be well founded and I’ve enjoyed working with my first clients during 2021.

Consulting is a very different gig to full-time permanent employment but it’s been great so far, offering me the opportunity to work in different domains with different types of organizations while also allowing me the freedom to enjoy a more relaxed lifestyle. I’m grateful to those who have put their faith (and dollars!) in me during 2021 as I begin my consulting journey and I’m looking forward to helping more organizations to improve their testing and quality practices during 2022.

Testing books

After publishing my first testing book in October 2020, in the shape of An Exploration of Testers, it’s been pleasing to see a steady stream of sales through 2021. I made my first donation of proceeds to the Association for Software Testing (AST) from sales of the book and another donation will follow early in 2022. I also formalized an arrangement with the AST so that all future proceeds will be donated to them and all new & existing members will receive a free copy of the book. (I’m open to additional contributions to this book, so please contact me if you’re interested in telling your story via the answers to the questions posed in the book!)

I started work on another book project in 2021, also through the AST. Navigating the World as a Context-Driven Tester provides responses to common questions and statements about testing from a context-driven perspective, with its content being crowdsourced from the membership of the AST and the broader testing community. There are responses to six questions in the book so far and I’m adding another response every month (or so). The book is available for free from the AST’s GitHub.

Podcasting

It was fun to kick off a new podcasting venture with two good mates from the local testing industry, Paul Seaman and Toby Thompson. We’ve produced three episodes of The 3 Amigos of Testing podcast so far and aim to get back on the podcasting horse early in 2022 to continue our discussions around automation started back in August. The process of planning content for the podcast, discussing and dry-running it, and finally recording is an interesting one and kudos to Paul for driving the project and doing the heavy lifting around editing and publishing each episode.

Volunteering for the UK Vegan Society

I’ve continued to volunteer with the UK’s Vegan Society and, while I’ve worked on proofreading tasks again through the year, I’ve also started contributing to their web research efforts over the last six months or so.

It was exciting to be part of one of the Society’s most significant outputs of 2021, viz. the Planting Value in the Food System report. This 40,000-word report was a mammoth research project and my work in proofing it was also a big job! The resulting report and the website are high quality and show the credibility of The Vegan Society in producing well-researched reference materials in the vegan space.

Joining the web research volunteer group immediately gave me the opportunity to learn, being tasked with leading the research efforts around green websites and accessibility testing.

I found the green website research particularly engaging, as it was not an area I’d even considered before and the carbon footprint of websites – and how it can easily be reduced – doesn’t seem to (yet) be on the radar of most companies. The lengthy recommendations resulting from my research in this area will inform changes to the Vegan Society website over time and this work has inspired me to look into offering advice in this area to companies who may have overlooked this potentially significant contributor to their carbon footprint.

I also spent considerable time investigating website accessibility and tooling to help with development & testing in this area. While accessibility testing is something I was tangentially aware of in my testing career, the opportunity to deep dive into it was great and, again, my recommendations will be implemented over time to improve the accessibility of the society’s own website.

I continue to enjoy working with The Vegan Society, increasing my contribution to and engagement with the vegan community worldwide. The passion and commitment of the many volunteers I interact with is invigorating. I see it as my form of vegan activism and a way to utilize my existing skills in research and the IT industry as well as gaining valuable new skills and knowledge along the way.

Status Quo projects

I was honoured to be asked to write a lengthy article for the Status Quo official fan club magazine, FTMO, following the sad passing of the band’s original bass player, Alan Lancaster in September. Alan spent much of his life here in Australia, migrating to Sydney in 1978 and he was very active in the music industry in this country following his departure from Quo in the mid-1980s. It was a labour of love putting together a 5000-word article and selecting interesting photos to accompany it from my large collection of Quo scrapbooks.

I spent time during 2021 on a new Quo project too, also based around my scrapbook collection. This project should go live in 2022 and has been an interesting learning exercise, not just in terms of website development but also photography. Returning to coding after a 20+ year hiatus has been a challenge but I’m reasonably happy with the simple website I’ve put together using HTML, CSS, JavaScript, PHP and a MySQL database. Gathering the equipment and skills to take great photos of scrapbook clippings has also been fun and it’s nice to get back into photography, a keen hobby of mine especially in my university days back in the UK.

In closing

As always, I’m grateful for the attention of my readers here and also followers on other platforms. I wish you all a Happy New Year and I hope you enjoy my posts and other contributions to the testing community to come through 2022!

A tester’s critique of “The 2021 State of Software Quality: The View from Enterprise Leaders & Followers”

I really should know better, but I decided to watch a webinar titled The 2021 State of Software Quality: The View from Enterprise Leaders & Followers from MicroFocus and Enterprise Management Associates, Inc. The promo spiel for the webinar read as follows:

The rapid rise of the digital economy became twice as important after layering on a worldwide pandemic. With every company having to become a software company, enterprise application development speed, volume, cost, quality, and risk are key determinants that define who survives and who does not. The pressure on application development teams to build more software faster and cheaper often runs counter to the objectives of software quality and managing risk.

Join Steve Hendrick, Research Director at Enterprise Management Associates, to hear key findings from a recent worldwide survey about software quality. This webinar will look at the characteristics and differences between software quality leaders and followers. Key to this discussion of software quality is the impact that people, process, and products are having on enterprise software quality. Completing this view into software quality will be a discussion of best and worst practices and their differences across three levels of software quality leadership.

While the opening gambit of this promo literally makes no sense – “The rapid rise of the digital economy became twice as important after layering on a worldwide pandemic” – the webinar sounded like it at least held some promise in terms of identifying differences between those “leading” in software quality and those “following”.

The survey data presented in this webinar was formed from 316 responses by Directors, VPs and C-level executives of larger enterprises (2000 employees or more). The presenter noted specifically that the mean enterprise size in the survey was over 11,000 employees and that this was a good thing, since larger enterprises have a “more complex take on DevOps”. This focus on garnering responses from people far away from the actual work of developing software in very large enterprises immediately makes me suspicious of the value of the responses for practitioners.

Unusually for surveys and reports of this type, though, the webinar started in earnest with a slide titled “What is Software Quality”:

While the three broad software quality attributes seem to me to represent some dimensions of quality, they don’t answer the question of what the survey means when it refers to “software quality”. If this was the definition given in the survey to guide participants, then it feels like their responses are likely skewed to thinking solely about these three dimensions and not the many more that are familiar to those of us with a broader perspective aligned with, for example, Jerry Weinberg’s definition of quality as “Value to some person”.

The next slide was particularly important as it introduced the segmentation of respondents into Outliers, Laggards, Mainstreamers and Leaders based on their self-assessment of the quality of their products:

This “leadership segmentation” is the foundation for the analysis throughout the rest of the webinar, yet it is completely based on self-assessment! Note that over half (55%) self-assess their quality as 8/10, 9/10 or even 10/10, while only 11% rate themselves as 5/10 or below. This looks like a classic example of cognitive bias and illusory superiority. This poor basis for the segmentation used so heavily in the analysis which follows is troubling.

Moving on, imagine being faced with answering this question: “How does your enterprise balance the contribution to software quality that is made by people, policy, processes, and products (development and DevOps tools)?” You might need to read that again. The survey responses came back as follows:

Call me cynical but this almost impossible to answer question looks like it resulted in most people just giving equal weight to all of the five choices, so ending up with just about 20% in each category.

It was soon time to look to “agile methodologies” for clues as to how “adopting” agile relates to quality leadership segmentation:

It was noted here that the “leaders” (again, remember this is respondents self-assessing themselves as quality leaders) were most likely to represent enterprises in which “Nearly all teams are using agile methods”. A reminder that correlation does not imply causation feels in order at this point.

The revelations kept coming, let’s look at the “phases” in which enterprises are “measuring quality”:

The presenter made a big deal here about the “leaders” showing much higher scores for measuring quality in the requirements and testing management “phases” than the “mainstreamers” and “laggards”. Of course, this provided the perfect opportunity to propagate the “cost of change” curve nonsense, with the presenter claiming it is “many times more expensive to resolve defects found in production than found during development”. He also sagely suggested that the leaders’ focus on requirements management and testing was part of their “secret sauce”.

When the surveyed enterprises were asked about their “software quality journey over the last two years”, the results looked like this:

The conclusion here was that “leaders” are establishing centres of excellence for software quality. There was a question about this during the short Q&A at the end of the deck presentation, asking what such a function actually does, to which the presenter said a CoE is “A good way to jumpstart an enterprise thinking about quality, it elevates the importance of quality in the enterprise” and “raises visibility of the fact that software quality is important”. An interesting but overlooked part of the data on this slide in my opinion is that about 20% of enterprises (even the “leaders”) said that their “focus on agile and DevOps has not had any impact on software quality”. I assume this data didn’t fit the narrative for this highly DevOps-focused webinar.

Attention then turned to tooling, firstly looking at development tools:

I find it interesting that all of these different types of development tooling are considered “DevOps tools” and it’s surprising that only around half of the “laggards” even claim to use source code management tools (it’s not clear why “mainstreamers” were left off this slide) and only just over half of the “leaders” are using continuous integration tools. These statistics seem contrary to the idea that even the leaders are really mature in their use of tooling around DevOps. (It’s also worth noting that there is also considerable wiggle room in the wording of this question, “regularly used or will be used”.) Deployment, rather than development, tooling was also analyzed but I didn’t spot anything interesting there, apart from the very fine-grained breakdown of tooling types (resulting in an incredible 19 different categories).

The presenter then examined why software quality was improving:

Notice that the slide is titled “Why has your software quality been improving since 2019?” while the actual survey question was “Why has your approach to software quality improved since the beginning of 2019?” Improvements in approach may or may not result in quality improvements. Some of the choices for response to this question don’t really answer the question, but clearly the idea was to suggest that adding more DevOps process and tooling leads to quality improvements while the data suggests otherwise (more around business drivers).

Moving from the “why” to the “how” came next (again with the same subtle difference between the slide title and the survey question):

There are again business/customer drivers behind most of these responses, but increased automation and use of tooling also show up highly. A standout is the “leaders” highlighting that “our multifunctional teams have learned how to work more effectively together” was a way to improve quality.

Some realizations/revelations about quality followed:

There were at least signs here of enterprises accepting that improving quality takes significant effort, not just from additional testing and tooling, but also from management and the business. The presenter focused on the idea of “shifting left” and there was a question on this during the Q&A too, asking “how important is shift left?” to which the presenter said it was “very important to leaders, it’s a best practice and it makes intuitive sense”. But he also noted that there was an additional finding in the deeper data around this that enterprises found it to be a “challenge in piling more responsibility on developers, made it harder for developers to get their job done, it alienates them and gets them bogged down with activities that are not coding” and that enterprises were sensitive to these concerns. From that response, it doesn’t sound to me like the “leaders” have really grasped the concept of “shift left” as I understand it and are still not viewing some types of testing as being part of developers’ responsibilities. The final entry on this slide also stood out to me (but was not highlighted by the presenter), with 17% of the “leaders” saying that “software quality is a problem if it is too high”, interesting!

Presentations like this usually end up talking about best practices and this webinar was no different:

The presenter focused on the high rating given by the “leaders” “adoption of quality standards (such as ISO)” but overlooked what I took as one of the few positives from any of the data in the webinar, namely that adopting “a more comprehensive approach to software testing” was a practice generally seen as something worth continuing to do.

The deck wrapped up with a summary of the “Best Practices of Software Quality Leaders”:

These don’t strike me as actually being best practices, rather statements and dubious conclusions drawn from the survey data. Point 4 on this slide – “Embracing agile and improving your DevOps practice will improve your software quality” – was highlighted (of course) but is seriously problematic. Remember the self-assessed “leaders” claimed that their software quality was increasing due to expanding their “DevOps processes and toolchain”, but correlation does not imply causation as this point on this final slide implies. This apparent causality was reinforced by the presenter’s answer to a question during the Q&A also, when asked “what is one thing we can do to improve quality?”. He said his preference is to understand the impact that software quality has on the business, but his pragmatic answer is to “take stock of your DevOps practice and look for ways to improve it, since maturing your DevOps practice improves quality.”

There were so many issues for me with the methodology behind the data presented in this webinar. The self-assessment of software quality produced by these enterprises makes the foundation for all of the conclusions drawn from the survey data very shaky in my opinion. The same enterprises who probably over-rated themselves on quality are also likely to have over-rated themselves in other areas (which appears to be the case throughout). There is also evidence of mistakenly taking correlation to imply causation, e.g. suggesting that adding more DevOps process and tooling improves quality. (Even claiming correlation is dubious given the self-assessment problem underneath all the data.)

There’s really not much to take away from the results of this survey for me in helping to understand what differences in approach, process, practice, tooling, etc. might lead to higher quality outcomes. I’m not at all surprised or disappointed in feeling this way, as my expectations of such fluffy marketing-led surveys are very low (based on experiencing of critiquing a number of them over the last few years). What does disappoint me is not the “state of software quality” supposedly evidenced by such surveys, but rather the state of the quality of dialogue and critical thinking around testing and quality in our industry.

The webinar can be viewed from https://content.microfocus.com/optimize-devops-tb/2021-software-quality (note that registration is required).

The power of the pause

While writing my last blog post, a review of Cal Newport’s “Deep Work” book, I reminded myself of a topic I’ve been meaning to blog about for a while, viz. the power of the pause.

Coming at this from a software development perspective, I mentioned in the last blog post that:

“There seems to be a new trend forming around “deployments to production” as being a useful measure of productivity, when really it’s more an indicator of busyness and often comes as a result of a lack of appetite for any type of pause along the pipeline for humans to meaningfully (and deeply!) interact with the software before it’s deployed.”

I often see this goal of deploying every change directly (and automatically) to production without the goal being accompanied by compelling reasons for doing so – apart from maybe “it’s what <insert big name tech company here> does”, even though you’re likely nothing like those companies in most other important ways. What’s the rush? While there are some cases where a very quick deployment to production is of course important, the idea that every change needs to be deployed in the same way is questionable for most organizations I’ve worked with.

Automated deployment pipelines can be great mechanisms for de-risking the process of getting updated software into production, removing opportunities for human error and making such deployments less of a drama when they’re required. But, just because you have this mechanism at your disposal, it doesn’t mean you need to use it for each and every change made to the software.

I’ve seen a lot of power in pausing along the deployment pipeline to give humans the opportunity to interact with the software before customers are exposed to the changes. I don’t believe we can automate our way out of the need for human interaction for software designed for use by humans, but I’m also coming to appreciate that this is increasingly seen as a contrarian position (and one I’m happy to hold). I’d ask you to consider whether there is a genuine need for automated deployment of every change to production in your organization and whether you’re removing the opportunity to find important problems by removing humans from the process.

Taking a completely different perspective, I’ve been practicing mindfulness meditation for a while now and haven’t missed a daily practice since finishing up full-time employment back in August 2020. One of the most valuable things I’ve learned from this practice is the idea of putting space between stimulus and response – being deliberate in taking pause.

Exploring the work of Gerry Hussey has been very helpful in this regard and he says:

The things and situations that we encounter in our outer world are the stimulus, and the way in which we interpret and respond mentally and emotionally to that stimulus is our response.

Consciousness enables us to create a gap between stimulus and response, and when we expand that gap, we are no longer operating as conditioned reflexes. By creating a gap between stimulus and response, we create an opportunity to choose our response. It is in this gap between stimulus and response that our ability to grow and develop exists. The more we expand this gap, the less we are conditioned by reflexes and the more we grow our ability to be defined not by happens to us but how we choose to respond.

Awaken Your Power Within: Let Go of Fear. Discover Your Infinite Potential. Become Your True Self (Gerry Hussey)

I’ve found this idea really helpful in both my professional and personal lives. It’s helped with listening, to focus on understanding rather than an eagerness to simply respond. The power of the pause in this sense has been especially helpful in my consulting work as it has a great side effect of lowering the chances of jumping into solution mode before fully understanding the problem at hand. Accepting the fact that things will happen outside my control in my day to day life but that I have the choice in how to respond to whatever happens has been transformational.

Inevitably, there are still times where my response to stimuli is quick, conditioned and primitive (with system 1 thinking doing its job) – and sometimes not kind. But I now at least recognize when this has happened and bring myself back to what I’ve learned from regular practice so as to continue improving.

So, whether it’s thinking specifically about software delivery pipelines or my interactions with the world around me, I’m seeing great power in the pause – and maybe you can too.

Deep testing and “Deep Work” (Cal Newport)

I’ve just finished reading Deep Work by Cal Newport and I found it engaging, interesting and applicable. While reading it, there were many reminders for me of the work of Michael Bolton and James Bach around “deep testing”.

Cal defines “Deep Work” as:

Professional activities performed in a state of distraction-free concentration that push your cognitive capabilities to their limit. These efforts create new value, improve your skill, and are hard to replicate.

while “Shallow Work” is:

Non-cognitively demanding, logistical-style tasks, often performed while distracted. These efforts tend to not create much new value in the world and are easy to replicate.

He argues that:

In an age of network tools… knowledge workers increasingly replace deep work with the shallow alternative – constantly sending and receiving email messages like human network routers, with frequent breaks for quick hits of distraction. Larger efforts that would be well served by deep thinking…get fragmented into distracted dashes that produce muted quality.

I’m sure that anyone who has worked in an office environment in the IT industry over the last decade will agree that their time has been impacted by distractions and a larger proportion of the working day has become occupied by shallow work. As if open plan offices weren’t bad enough on their own, the constant stream of pulls on your attention from email, Slack and social media notifications has resulted in a very distracted state becoming the norm.

One of the key messages Cal delivers in the book is that deep work is rare, valuable and meaningful:

The Deep Work Hypothesis: The ability to perform deep work is becoming increasingly rare at exactly the same time it is becoming increasingly valuable in our economy. As a consequence, the few who cultivate this skill, and then make it the core of their working life, will thrive.

He makes the observation that, even in knowledge work, there is still a tendency to focus on “busyness”:

Busyness as a Proxy for Productivity: In the absence of clear indicators of what it means to be productive and valuable in their jobs, many knowledge workers turn back toward an industrial indicator of productivity; doing lots of stuff in a visible manner.

I’ve seen this as a real problem for testers in many organizations. When there is poor understanding of what good testing looks like (the norm, unfortunately), it’s all too common for testers to be tracked and measured by test case counts, bug counts, etc. These proxies for productivity really are measures of busyness and not reflections of true value being added by the tester. There seems to be a new trend forming around “deployments to production” as being a useful measure of productivity, when really it’s more an indicator of busyness and often comes as a result of a lack of appetite for any type of pause along the pipeline for humans to meaningfully (and deeply!) interact with the software before it’s deployed. (I may blog separately on the “power of the pause” soon.)

On the subject of how much more meaningful deep work is, Cal refers to Dreyfus & Kelly’s All Things Shining book and its focus on craftsmanship:

A … potential for craftsmanship can be found in most skilled jobs in the information economy. Whether you’re a writer, marketer, consultant, or lawyer: Your work is craft, and if you hone your ability and apply it with respect and care, then like the skilled wheelwright [as example from the Dreyfus & Kelly book] you can generate meaning in the daily efforts of your professional life.

Cultivating craftsmanship is necessarily a deep task and therefore requires a commitment to deep work.

I have referred to software testing as a craft since I first heard it described as such by Michael Bolton during the RST course I attended back in 2007. Talking about testing in this way is important to me and, as Cal mentions, treating it as a craft that you can become skilled in and take pride in all helps to make life as a tester much more meaningful.

The second part of Cal’s book focuses on four rules to help achieve deep work in practice, viz. work deeply, embrace boredom, quit social media, and drain the shallows. I won’t go into detail on the rules here (in the interests of brevity and to encourage you to read the book for yourself to learn these practical tips), but this quote from the “drain the shallows” rule resonated strongly and feels like something we should all be trying to bring to the attention of the organizations we work with:

The shallow work that increasingly dominates the time and attention of knowledge workers is less vital than it often seems in the moment. For most businesses, if you eliminated significant amounts of this shallowness, their bottom line would likely remain unaffected. As as Jason Fried [co-founder of software company 37signals] discovered, if you not only eliminate shallow work, but also replace this recovered time with more of the deep alternative, not only will the business continue to function; it can become more successful.

Coming back to Cal’s definition of “deep work”:

Professional activities performed in a state of distraction-free concentration that push your cognitive capabilities to their limit. These efforts create new value, improve your skill, and are hard to replicate.

When I read this definition, it immediately brought to mind session-based test management (SBTM) in which timeboxed periods of uninterrupted testing are the unit of work. I’ve seen the huge difference that adoption of SBTM can make in terms of encouraging deeper testing and improving testing skills. Thinking about “deep testing”, Michael Bolton and James Bach have described it as follows:

Testing is deep to the degree that it has a probability of finding rare, subtle, or hidden problems that matter.

Deep testing requires substantial skill, effort, preparation, time, or tooling, and reliably and comprehensively fulfills its mission.

By contrast, shallow testing does not require much skill, effort, preparation, time, or tooling, and cannot reliably and comprehensively fulfill its mission.

Blog post https://www.developsense.com/blog/2017/03/deeper-testing-1-verify-and-challenge/ (Michael Bolton)

The parallels between Cal’s idea of “deep work” and Michael & James’s “deep testing” are clear. Being mindful of the difference between such deep testing and the more common shallow testing I see in many teams is important, as is the ability to clearly communicate this difference to stakeholders (especially when testing is squeezed under time pressures or being seen as optional in the frantic pace of continuous delivery environments).

I think “Deep Work” is a book worth reading for testers, not just for the parallels with deep testing I’ve tried to outline above but also for the useful tips around reducing distractions and freeing up your capacity for deeper work.

A year has gone…

Almost unbelievably, it’s now been a year since I left my long stint at Quest Software. It’s been a very different year for me than any of the previous 25-or-so spent in full-time employment in the IT industry. The continuing impact of COVID-19 on day-to-day life in my part of the world has also made for an unusual 12 months in many ways.

While I haven’t missed working at Quest as much as I expected, I’ve missed the people I had the chance to work with for so long in Melbourne and I’ve also missed my opportunities to spend time with the teams in China that I’d built up such a strong relationship with over the last few years (and who, sadly, have all since departed Quest as well as their operations there were closed down this year).

I’ve deliberately stayed fairly engaged with the testing community during this time, including giving a talk at at meetup, publishing my first testing book, launching my own testing consultancy business, and blogging regularly (including a ten-part blog series answering the most common search engine questions around testing).

Starting to work with my first clients in a consulting capacity is an interesting experience with a lot of learning opportunities. I plan to blog on some of my lessons learned from these early engagements later in the year.

Another fun and testing-related project kicked off in May, working with my good friends from the industry, Paul Seaman and Toby Thompson, to start The 3 Amigos of Testing podcast. We’ve always caught up regularly to chat about testing and life in general over a cold one or two, and this new podcast has given us plenty of opportunities to talk testing again, albeit virtually. A new episode of this podcast should drop very soon after this blog post.

On more personal notes, I’ve certainly been finding more time for myself since ending full-time employment. There are some non-negotiables, such as daily one-hour (or more) walks and meditation practice, and I’ve also been prioritizing bike riding and yoga practice. I’ve been reading a lot too – more than a book a week – on a wide variety of different topics. These valuable times away from technology are foundational in helping me to live with much more ease than in the past.

I’ve continued to do volunteer work with The Vegan Society (UK). I started off performing proofreading tasks and have also now joined their web volunteers’ team where I’ve been leading research projects on how to reduce the carbon footprint of the Society’s website and also to improve its accessibility. These web research projects have given me the welcome opportunity to learn about areas that I was not very familiar with before, the “green website” work being particularly interesting and it has inspired me to pursue other opportunities in this area (watch this space!). A massive proofreading task led to the recent publication of the awesome Planting Value in the Food System reports, with some deep research and great ideas for transitioning UK farming away from animal-based agriculture.

Looking to the rest of 2021, the only firm commitment I have in the testing space – outside of consulting work – is an in-person conference talk at Testing Talks 2021 in Melbourne. I’ll be continuing with my considerable volunteering commitment with the Vegan Society and I have a big Status Quo project in the works too! With little to no prospect of long-distance travel in Australia or overseas in this timeframe, we will enjoy short breaks locally between lockdowns and also press on with various renovation projects on our little beach house.

(Given the title of this blog, I can’t waste this opportunity to include a link to one of my favourite Status Quo songs, “A Year” – this powerful ballad morphs into a heavier piece towards the end, providing some light amongst the heaviness of its parent album, “Piledriver”. Enjoy!)

Speaking at the Testing Talks 2021 (The Reunion) conference (28 October, Melbourne)

After almost two decades of very regularly attending testing conferences, the combined impacts of COVID-19 and finishing up my career at Quest have curtailed these experiences in more recent times. I’ve missed the in-person interaction with the testing community facilitated by such events, as I know many others have also.

The latter stages of 2020 saw me give three talks; firstly for the DDD Melbourne By Night meetup, then a two-minute talk for the “Community Strikes The Soapbox” part of EuroSTAR 2020 Online, and finally a contribution to the inaugural TestFlix conference. All of these were virtual events and at least gave me some presentation practice.

The opportunity to be part of an in-person conference in Melbourne was very appealing and, after chatting with Cameron Bradley, I committed to building a new talk in readiness for his Testing Talks 2021 Conference.

With the chance to develop a completely new talk, I riffed on a few ideas before settling on what seemed like a timely story for me to tell, namely what I’ve learned from twenty-odd years in the testing industry. I’ve titled the talk “Lessons Learned in Software Testing”, in a deliberate nod to the awesome book of the same name.

I’ve stuck with my usual routine in putting this new talk together, using a mindmap to help me come up with the structure and key messages before starting to cut a slide deck. It remains a challenge for me to focus more on the talk content than refining the slides at this stage, but I’m making a conscious effort to get the messaging down on rough slides before putting finishing touches to them later on.

It’s been interesting to look back over such a long career in the one industry, thinking about the trends that have come and gone, and realizing how much remains the same in terms of being a good tester adding value to projects. I’m looking forward to sharing some of the lessons I’ve learned along the way – some specifically around testing and some more general – in this new talk later in the year.

Fingers crossed (and COVID-permitting!), I’ll be taking the stage at the Melbourne Convention & Exhibition Centre on 28th October to deliver my talk to what I hope will be a packed house. Maybe you can join me? More details and tickets are available from the Testing Talks 2021 Conference website.

Is talking about “scaling” human testing missing the point?

I recently came across an article from Adam Piskorek about the way Google tests its software.

While I was already familiar with the book How Google Tests Software (by James Whittaker, Jason Arbon et al, 2012), Adam’s article introduced another newer book about how Google approaches software engineering more generally, Software Engineering at Google: Lessons Learned from Programming Over Time (by Titus Winters, Tom Manshreck & Hyrum Wright, 2020).

The following quote in Adam’s article is lifted from this newer book and made me want to dive deeper into the book’s broader content around testing*:

Attempting to assess product quality by asking humans to manually interact with every feature just doesn’t scale. When it comes to testing, there is one clean answer: automation.

Chapter 11 (Testing Overview), p210 (Adam Bender)

I was stunned by this quote from the book. It felt like they were saying that development simply goes too quickly for adequate testing to be performed and also that automation is seen as the silver bullet to moving as fast as they desire while maintaining quality, without those pesky slow humans interacting with the software they’re pushing out.

But, in the interests of fairness, I decided to study the four main chapters of the book devoted to testing to more fully understand how they arrived at the conclusion in this quote – Chapter 11 which offers an overview of the testing approach at Google, chapter 12 devoted to unit testing, chapter 13 on test doubles and chapter 14 on “Larger Testing”. The book is, perhaps unsurprisingly, available to read freely on Google Books.

I didn’t find anything too controversial in chapter 12, rather mostly sensible advice around unit testing. The following quote from this chapter is worth noting, though, as it highlights that “testing” generally means automated checks in their world view:

After preventing bugs, the most important purpose of a test is to improve engineers’ productivity. Compared to broader-scoped tests, unit tests have many properties that make them an excellent way to optimize productivity.

Chapter 13 on test doubles was similarly straightforward, covering the challenges of mocking and giving decent advice around when to opt for faking, stubbing and interaction testing as approaches in this area. Chapter 14 dealt with the challenges of authoring tests of greater scope and I again wasn’t too surprised by what I read there.

It is chapter 11 of this book, Testing Overview (written by Adam Bender), that contains the most interesting content in my opinion and the remainder of this blog post looks in detail at this chapter.

The author says:

since the early 2000s, the software industry’s approach to testing has evolved dramatically to cope with the size and complexity of modern software systems. Central to that evolution has been the practice of developer-driven, automated testing.

I agree that the general industry approach to testing has changed a great deal in the last twenty years. These changes have been driven in part by changes in technology and the ways in which software is delivered to users. They’ve also been driven to some extent by the desire to cut cost and it seems to me that focusing more on automation has been seen (misguidedly) as a way to reduce the overall cost of delivering software solutions. This focus has led to a reduction in the investment in humans to assess what we’re building and I think we all too often experience the results of that reduced level of investment.

Automated testing can prevent bugs from escaping into the wild and affecting your users. The later in the development cycle a bug is caught, the more expensive it is; exponentially so in many cases.

Given the perception of Google as a leader in IT, I was very surprised to see this nonsense about the cost of defects being regurgitated here. This idea is “almost entirely anecdotal” according to Laurent Bossavit in his excellent The Leprechauns of Software Engineering book and he has an entire chapter devoted to this particular mythology. I would imagine that fixing bugs in production for Google is actually inexpensive given the ease with which they can go from code change to delivery into the customer’s hands.

Much ink has been spilled about the subject of testing software, and for good reason: for such an important practice, doing it well still seems to be a mysterious craft to many.

I find the choice of words here particularly interesting, describing testing as “a mysterious craft”. While I think of software testing as a craft, I don’t think it’s mysterious although my experience suggests that it’s very difficult to perform well. I’m not sure whether the wording is a subtle dig at parts of the testing industry in which testing is discussed in terms of it being a craft (e.g. the context-driven testing community) or whether they are genuinely trying to clear up some of the perceived mystery by explaining in some detail how Google approaches testing in this book.

The ability for humans to manually validate every behavior in a system has been unable to keep pace with the explosion of features and platforms in most software. Imagine what it would take to manually test all of the functionality of Google Search, like finding flights, movie times, relevant images, and of course web search results… Even if you can determine how to solve that problem, you then need to multiply that workload by every language, country, and device Google Search must support, and don’t forget to check for things like accessibility and security. Attempting to assess product quality by asking humans to manually interact with every feature just doesn’t scale. When it comes to testing, there is one clear answer: automation

(note: bold emphasis is mine)

We then come to the source of the quote that first piqued my interest. I find it interesting that they seem to be suggesting the need to “test everything” and using that as a justification for saying that using humans to interact with “everything” isn’t scalable. I’d have liked to see some acknowledgement here that the intent is not to attempt to test everything, but rather to make skilled, risk-based judgements about what’s important to test in a particular context for a particular mission (i.e. what are we trying to find out about the system?). The subset of the entire problem space that’s important to us is something we can potentially still ask humans to interact with in valuable ways. The “one clear answer” for testing being “automation” makes little sense to me, given the well-documented shortcomings of automated checks (some of which are acknowledged in this same book) and the different information we should be looking to gather from human interactions with the software compared to that from algorithmic automated checks.

Unlike the QA processes of yore, in which rooms of dedicated software testers pored over new versions of a system, exercising every possible behavior, the engineers who build systems today play an active and integral role in writing and running automated tests for their own code. Even in companies where QA is a prominent organization, developer-written tests are commonplace. At the speed and scale that today’s systems are being developed, the only way to keep up is by sharing the development of tests around the entire engineering staff.

Of course, writing tests is different from writing good tests. It can be quite difficult to train tens of thousands of engineers to write good tests. We will discuss what we have learned about writing good tests in the chapters that follow.

I think it’s great that developers are more involved in testing than they were in the days of yore. Well-written automated checks provide some safety around changing product code and help to prevent a skilled tester from wasting their time on known “broken” builds. But, again, the only discussion that follows in this particular book (as promised in the last sentence above) is about automation and not skilled human testing.

Fast, high-quality releases
With a healthy automated test suite, teams can release new versions of their application with confidence. Many projects at Google release a new version to production every day—even large projects with hundreds of engineers and thousands of code changes submitted every day. This would not be possible without automated testing.

The ability to get code changes to production safely and quickly is appealing and having good automated checks in place can certainly help to increase the safety of doing so. “Confidence” is an interesting choice of word to use around this (and is used frequently in this book), though – the Oxford dictionary definition of “confidence” is “a feeling or belief that one can have faith in or rely on someone or something”, so the “healthy automated test suite” referred to here appears to be one that these engineers feel comfortable to rely on enough to say whether new code should go to production or not.

The other interesting point here is about the need to release new versions so frequently. While it makes sense to have deployment pipelines and systems in place that enable releasing to production to be smooth and uneventful, the desire to push out changes to customers very frequently seems like an end in itself these days. For most testers in most organizations, there is probably no need or desire for such frequent production changes so deciding testing strategy on the perceived need for these frequent changes could lead to goal displacement – and potentially take an important aspect of assessing those changes (viz. human testers) out of the picture altogether.

If test flakiness continues to grows you will experience something much worse than lost productivity: a loss of confidence in the tests. It doesn’t take needing to investigate many flakes before a team loses trust in the test suite, After that happens, engineers will stop reacting to test failures, eliminating any value the test suite provided. Our experience suggests that as you approach 1% flakiness, the tests begin to lose value. At Google, our flaky rate hovers around 0.15%, which implies thousands of flakes every day. We fight hard to keep flakes in check, including actively investing engineering hours to fix them.

It’s good to see this acknowledgement of the issues around automated check stability and the propensity for unstable checks to lead to a collapse in trust in the entire suite. I’m interested to know how they go about categorizing failing checks as “flaky” to be included in their overall 0.15% “flaky rate”, no doubt there’s some additional human effort involved there too.

Just as we encourage tests of smaller size, at Google, we also encourage engineers to write tests of narrower scope. As a very rough guideline, we tend to aim to have a mix of around 80% of our tests being narrow-scoped unit tests that validate the majority of our business logic; 15% medium-scoped integration tests that validate the interactions between two or more components; and 5% end-to-end tests that validate the entire system. Figure 11-3 depicts how we can visualize this as a pyramid.

It was inevitable during coverage of automation that some kind of “test pyramid” would make an appearance! In this case, they use the classic Mike Cohn automated test pyramid but I was shocked to see them labelling the three different layers with percentages based on test case count. By their own reasoning, the tests in the different layers are of different scope (that’s why they’re in different layers, right?!) so counting them against each other really makes no sense at all.

Our recommended mix of tests is determined by our two primary goals: engineering productivity and product confidence. Favoring unit tests gives us high confidence quickly, and early in the development process. Larger tests act as sanity checks as the product develops; they should not be viewed as a primary method for catching bugs.

The concept of “confidence” being afforded by particular kinds of checks arises again and it’s also clear that automated checks are viewed as enablers of productivity.

Trying to answer the question “do we have enough tests?” with a single number ignores a lot of context and is unlikely to be useful. Code coverage can provide some insight into untested code, but it is not a substitute for thinking critically about how well your system is tested.

It’s good to see context being mentioned and also the shortcomings of focusing on coverage numbers alone. What I didn’t really find anywhere in what I read in this book was the critical thinking that would lead to an understanding that humans interacting with what’s been built is also a necessary part of assessing whether we’ve got what we wanted. The closest they get to talking about humans experiencing the software in earnest comes from their thoughts around “exploratory testing”:

Exploratory Testing is a fundamentally creative endeavor in which someone treats the application under test as a puzzle to be broken, maybe by executing an unexpected set of steps or by inserting unexpected data. When conducting an exploratory test, the specific problems to be found are unknown at the start. They are gradually uncovered by probing commonly overlooked code paths or unusual responses from the application. As with the detection of security vulnerabilities, as soon as an exploratory test discovers an issue, an automated test should be added to prevent future regressions.

Using automated testing to cover well-understood behaviors enables the expensive and qualitative efforts of human testers to focus on the parts of your products for which they can provide the most value – and avoid boring them to tears in the process.

This description of what exploratory testing is and what it’s best suited to are completely unfamiliar to me, as a practitioner of exploratory testing for fifteen years or so. I don’t treat the software “as a puzzle to be broken” and I’m not even sure what it would mean to do so. It also doesn’t make sense to me to say “the specific problems to be found are unknown at the start”, surely this applies to any type of testing? If we already know what the problems are, we wouldn’t need to test to discover them. My exploratory testing efforts are not focused on “commonly overlooked code paths” either, in fact I’m rarely interested in the code but rather the behaviour of the software experienced by the end user. Given that “exploratory testing” as an approach has been formally defined for such a long time (and refined over that time), it concerns me to see such a different notion being labelled as “exploratory testing” in this book.

TL;DRs
Automated testing is foundational to enabling software to change.
For tests to scale, they must be automated.
A balanced test suite is necessary for maintaining healthy test coverage.
“If you liked it, you should have put a test on it.”
Changing the testing culture in organizations takes time.

In wrapping up chapter 11 of the book, the focus is again on automated checks with essentially no mention of human testing. The scaling issue is highlighted here also, but thinking solely in terms of scale is missing the point, I think.

The chapters of this book devoted to ‘testing” in some way cover a lot of ground, but the vast majority of that journey is devoted to automated checks of various kinds. Given Google’s reputation and perceived leadership status in IT, I was really surprised to see mention of the “cost of change curve” and the test automation pyramid, but not surprised by the lack of focus on human exploratory testing.

Circling back to that triggering quote I saw in Adam’s blog (“Attempting to assess product quality by asking humans to manually interact with every feature just doesn’t scale”), I didn’t find an explanation of how they do in fact assess product quality – at least in the chapters I read. I was encouraged that they used the term “assess” rather than “measure” when talking about quality (on which James Bach wrote the excellent blog post, Assess Quality, Don’t Measure It), but I only read about their various approaches to using automated checks to build “confidence”, etc. rather than how they actually assess the quality of what they’re building.

I think it’s also important to consider your own context before taking Google’s ideas as a model for your own organization. The vast majority of testers don’t operate in organizations of Google’s scale and so don’t need to copy their solutions to these scaling problems. It seems we’re very fond of taking models, processes, methodologies, etc. from one organization and trying to copy the practices in an entirely different one (the widespread adoption of the so-called “Spotify model” is a perfect example of this problem).

Context is incredibly important and, in this particular case, I’d encourage anyone reading about Google’s approach to testing to be mindful of how different their scale is and not use the argument from the original quote that inspired this post to argue against the need for humans to assess the quality of the software we build.

* It would be remiss of me not to mention a brilliant response to this same quote from Michael Bolton – in the form of his 47-part Twitter thread (yes, 47!).

Lessons learned from writing a ten-part blog series

After leaving Quest back in August 2020, I spent some time working on ideas for a new venture. During this time, I learned some useful lessons from courses by Pat Flynn and got some excellent ideas from Teachable‘s Share What You Know Summit. When I launched my new software testing consultancy, Dr Lee Consulting, I decided to try out one of the ideas I’d heard for generating content around my new brand and so started a blog series, inspired most notably by Terry Rice.

After committing to a ten-part series of posts, I decided to announce my intention publicly (on Twitter and LinkedIn) to keep myself honest, but chose not to commit to a cadence for publishing the parts. I felt that publishing a new blog post once a week was about right and made an internal note to aim for this cadence. Some posts took longer to write than others and the review cycle was more involved for some posts. The series also spread over the Christmas/New Year period, but the entire series took me just on three months to complete so my cadence ended up being close to what I initially thought it would be.

My blogging over the last several years has usually been inspired by something I’ve read or observed or an event I’ve attended such a conference or meetup. These somewhat more spontaneous and sporadic content ideas mean that my posts have been inconsistent in both topic and cadence, not that I see any of this as being an issue.

Committing to a series of posts for which the subject matter was determined for me (in this case by search engine data) meant that I didn’t need to be creative in coming up with ideas for posts, but instead could focus on trying to add something new to the conversation in terms of answering these common questions. I found it difficult to add much nuance in answering some of the questions, but others afforded more lengthy and perhaps controversial responses. Hopefully the series in its entirety is of some value anyway.

My thanks again to Paul Seaman and Ky for reviewing every part of this blog series, as well as to all those who’ve amplified the posts in this series via their blogs, newsletters, lists and social media posts.

The ten parts of my first blog series can be accessed using the links below:

  1. Why is software testing important?,
  2. How does software testing impact software quality?
  3. When should software testing activities start?
  4. How is software testing done?
  5. Can you automate software testing?
  6. Is software testing easy?
  7. Is software testing a good career?
  8. Can I learn software testing on my own?
  9. Which software testing certification is the best?
  10. What will software testing look like in 2021?

(Feel free to send me ideas for any topics you’d like to see covered in a multi-part blog series in the future.)

Donation of proceeds from sales of “An Exploration of Testers” book

In October 2020, I published my first software testing book, “An Exploration of Testers”. As I mentioned then, one of my intentions with this project was to generate some funds to give back to the testing community (with 100% of all proceeds I receive from book sales being returned to the community).

I’m delighted to announce that I’ve now made my first donation as a result of sales so far, based on royalties for the book in LeanPub to date:

LeanPub royalties

(Note that there is up to a 45-day lag between book sales and my receipt of those funds, so some recent sales are not included in this first donation amount.)

I’ve personally rounded up the royalties paid so far (US$230.93) to form a donation of US$250 (and covered their processing fees) to the Association for Software Testing for use in their excellent Grant Program. I’m sure these funds will help meetup and peer conference organizers greatly in the future.

I will make further donations of royalties received from book sales not covered by this first donation.

“An Exploration of Testers” is available for purchase via LeanPub and a second edition featuring more contributions from great testers around the world should be coming soon. My thanks to all of the contributors so far for making the book a reality and also to those who’ve purchased a copy, without whom this valuable donation to the AST wouldn’t have been possible.

Common search engine questions about testing #10: “What will software testing look like in 2021?”

This is the final part of a ten-part blog series in which I’ve answered some of the most common questions asked about software testing, according to search engine autocomplete results (thanks to Answer The Public).

In this last post, I ponder the open question of “What will software testing look like in 2021?” (note: updated the year from 2020 in my original dataset from Answer The Public to 2021).

The reality for most people involved in the software testing business is that testing will look pretty much the same in 2021 as it did in 2020 – and probably as it did for many of the years before that too. Incremental improvements take time in organisations and the scope & impact of such changes will vary wildly between different organisations and even within different parts of the same organisation.

I fully expect 2021 to yield a number of reports about trends in software testing and quality, akin to Capgemini’s annual World Quality Report (which I critiqued again last year). There will probably be a lot of noise around the application of AI and machine learning to testing, especially from tool vendors and the big consultancies.

I feel certain that automation (especially of the “codeless” variety) will continue to be one of the main threads around testing with companies continuing to recruit on the basis of “automated testing” prowess over exploratory testing skills.

I think a small but dedicated community of people genuinely interested in advancing the craft of software testing will continue to publish their ideas and look to inject some reality into the various places that testing gets discussed online.

My daily meditation practice has applications here too. In the same way that the practice helps me to recognise when thoughts are happening without getting caught up in their storyline, I think you should make an effort to observe the inevitable commentary on trends in the testing industry through 2021 without going out of your way to follow them. These trends are likely to change again next year and expending effort trying to keep “on trend” is likely effort better spent elsewhere. Instead, I would recommend focusing on the fundamentals of good software testing, while continuing to demonstrate the value of good testing and advancing the practice as best you can in the context of your organisation.

I would also encourage you to make 2021 the year that you tell your testing stories for the benefit of the wider community – your stories are unique, valuable and a great way for others to learn what’s really going on in our industry. There are many avenues to share your first-person experiences – blog about them, share them as LinkedIn articles, talk about them at meetups or present them at a conference (many of which seem destined to remain as virtual events through 2021, which I see as a positive in terms of widening the opportunity for more diverse stories to be heard).

For some alternative opinions on what 2021 might look like, check out the responses to the recent question “What trends do you think will emerge for testing in 2021?” posed by Ministry of Testing on LinkedIn.

You can find the previous nine parts of this blog series at:

I’ve provided the content in this blog series as part of the “not just for profit” approach of my consultancy business, Dr Lee Consulting. If the way I’m writing about testing resonates with you and you’re looking for help with the testing & quality practices in your organisation, please get in touch and we can discuss whether I’m the right fit for you.

I’m grateful to Paul Seaman and Ky who acted as reviewers for every part of this blog series; I couldn’t have completed the series without their help, guidance and encouragement along the way, thank you!

Thanks also to all those who’ve amplified the posts in this series via their blogs, lists and social media posts – it’s been much appreciated. And, last but not least, thanks to Terry Rice for the underlying idea for the content of this series.