Skip to main content

The almost unreproducible bug...

In a previous post (The Defect Dance) I mention the first point when finding a bug, "Can it be reproduced?". If the answer is no, I mention to make a note of it in case it happens again...

However, this is a bit of a simplification, there's a lot more to it than "Can a bug be reproduced?" Yes/No...

Bugs for whatever reason, can be intermittent/transient. Some bugs might only appear under very specific scenarios, under heavy load etc. 

We've all been there as a QA professional, where we've experienced some quirky behaviour, that on first instance appears to be a bug, but you try and reproduce it, and you can't, and we all know that a developer will only be interested in a bug that is reproducible! So what's the next step?

For the sake of this blog post we'll use an error on the asos website, for when you try to add an item to the bag...




So, don't immediately try and recreate the issue, I once read an interesting analogy about a mongoose and an antelope, and how they react to certain situations. You want to try not to be a mongoose, as their initial reaction is to attack blindly and often irrationally when it is backed into a wall, and staring death in the eye, what you want to do is be an antelope, who will freeze and think about the next move.... 

In this instance, make a note of the browser, the browser version, the url, the product id... 

So after thinking about it, you want to try and replicate the problem, you perform the same steps...  



Lo and behold! It works!!! Do not accept this as "there was never a bug"....

This could be for any number of reasons. The first thing to do is make a note of what it is that you did differently the second time as opposed to the first (this is why it's important to make a note of any information you have when the bug first appears). There may be an obvious difference, so we look at the differences and notice that it's the product that is causing the issue, this means that it's likely a data issue with that product, so we can drill down and investigate why. However, pretend, for this blog that there are no apparent differences, what do you, as a QA, do next?

Again, we take the mindset of an Antelope... And think about the next move.

One thing I find helpful is to look in the error logs, be that log4net or even the event viewer on the box that the service/app is hosted on, you can find information about the issue there (if you have logging set up of course :) ). From there you can often find information about what service it was that threw the error, and drill down a bit further. If you don't feel comfortable doing this, then you can speak to a developer and ask them to do this for you, but there's nothing, in my opinion to be scared about, viewing the logs should be a QAs bread and butter. There may or may not be something in the logs, depending if the error is server side or it may be client side. It is however a good starting point and can help you in your quest for discovering the issue.

If there is nothing in the error logs server side, then another possible cause may be a JavaScript error on the page itself, these can be viewed in the error console on the browser, in Chrome, this is easily accessed by hitting Ctrl, Shift and J (for other browsers view the page here). From here it will show any warnings or errors about any CSS, JavaScript  there may be a JavaScript error in here that was preventing the add to basket from working (for instance was all the JavaScript loaded when the bug occurred).

Depending on the system architecture, there could be a caching issue, either in the browser or on the server, so it could be worth investigating that, closing the browser and clearing the cache (information around how to do this can be found here), see if the issue reappears.

If there is still nothing, then there may be a config issue on one of the boxes, it may be pointing to the wrong service that isn't accessible from your environment, so view the config around the services/app that are behaving erratically and you may find an issue there.

Failing all the above, it may be worth getting someone in who knows the system and talk them through what you were doing, they may spot something that you didn't, it might be that the service is slow to respond to the first call, and after that it's fine, so something like that points to a performance issue.

Also, you could try searching the bug database, there may be a similar issue logged already, which may have some more information on the circumstances that led to the bug, which could help in recreating it.

If all this fails, then I would let the team know, so that they too are aware that there was a problem, and if it arises again on any of their machines then they can let you know what it is that they were doing that caused it so you can perform the Defect Dance again.

How long should you spend investigating/trying to reproduce a bug? 

The question that I do get asked, is how do you determine how long to spend trying to reproduce a bug, to which the obvious answer is the severity/priority of the bug, the more critical the bug would be, the longer you should spend looking for it. If it's a UI issue for instance, and you can't recreate it, then there's no need to investigate in such detail, but some research obviously would not go amiss.




So as you can see, it isn't as simple as is the bug reproducible? You can't give an immediate answer without investigating further, unfortunately, I couldn't fit all the above into the defect dance diagram, and thought it deserved a blog post in it's own right! Feel free to add any of your own comments on unreproducible bugs, or if you have your own stories....









Comments

  1. I had a similar experience with a issue, this was related to user permissions. There was a particular issue with functionality not loading when you are a lower privileged user, but works completely fine when you have higher permission settings like administrator.

    I remember logging this issue and reopening this for more than 3 times, because we initially thought this is a data issue before both the dev and I could realize this is related to permission setting!

    ReplyDelete
    Replies
    1. Glad it was all sorted at least. It can be most frustrating sometimes!!

      Delete
  2. It was working perfectly fine with me too, hope it will work for you too and you can have what you want. Goodluck man and best wishes

    ReplyDelete
  3. Very good informative article. Thanks for sharing such nice article, keep on up dating such good articles.

    Best Digital Transformation Services | DM Services | Austere Technologies

    ReplyDelete

Post a Comment

Popular posts from this blog

Testers: Be more like a Super-Villain!

Who doesn't love a Super Hero? Talk to my son, and he'll tell you how much he loves them, talk to many adults and they'll say the same! Deep down, we all love to be the Super Hero, we all want to save the day! However, I want to talk about the flip side of Super Heroes, the Super Villains... I often play Imaginext with my son, and I (unfortunately?) am nearly always the Super Villain! Be it Lex Luthor, Joker, Two Face, Mr Freeze or The Riddler! These are all great characters and great Super Villains, but why would I want to write about Super Villains? A while ago where I worked, we had a few Super Heroes, people who would be able to come in and "fix" things that had broken and help deliver projects on time. We then shifted, we decided to do away with the Super Hero culture and try and prevent from being in that position in the first place, whilst we didn't go as far as wanting to hire Super Villains, it's definitely a story that has stuck with me and t...

Treating Test Code as Production Code

It's important when writing automated tests to remember that the code you write should be up to production standards, meaning any conventions that you have in place should be adhered to and that it should follow good design patterns. Too many people often say why does it have to be as good as production code, it's "Only" a test, so long as it passes then that's fine... To answer this we need to look at why we want our tests to be written in such a structured and efficient manner: - Maintainability - by making the test code structured and efficient, it becomes far easier to maintain and in doing so changes in the future can happen quickly as the test isn't linked to anything that it shouldn't be and it's easy to understand for a new set of eyes. - Durability - Again by making the tests structured they should be resistant to changes, if you change a variable name for instance then it shouldn't effect the unit test unless it absolutely has to....

Tech Develops - A day dedicated to YOU!

Working in Tech, it can be difficult to find the time to further improve yourself, you're focused a lot on delivery, and it can be hard to drag yourself away from it and spend time on delivering an improved you. This is why some companies are starting to have time dedicated to your personal development, where people drop tools and do a personal project or watch some tutorials. Luckily working at ASOS we get the last Friday of every month to focus on this! Last Friday we held what we call a "Tech Develops" day, where as an employee of ASOS and working in Technology, In the week running up to it we decided it would be a good idea to have a platform where people could stand up and perform a 99 Second Talk about anything they please. We had 12 people sign up to it, and we had talks ranging from the technical (Git-Bisect) to a Conference review (UKStar). The first talk was an informative talk about Git Bisect and how it's used and why because of it, it's import...