As a team developing and evolving a product, defects are inevitable, whether they are discovered internally or reported by external customers.
Back in the days of waterfall methodology, teams would meet at a regular cadence to review bugs and set priority and severity. Bug Triage meetings were typically driven by the QA team.
As companies continue to go through the agile transformation, one of the questions that agile teams sometimes have is how Scrum proposes that bugs are handled. If the bug is a P0 issue (i.e. causes system downtime or breaks core product functionality), it can be pulled into the sprint immediately for investigation of root cause analysis and resolution. Otherwise, a bug is treated no differently than a story in the backlog. Meaning, they are reviewed during weekly refinement and estimated in terms of the complexity of investigating the issue. For issues which are well understood or which is similar to other types of bugs the team has addressed in the past, the team can be more precise in terms of estimating complexity. In cases where the bug or root cause is not well understood, complexity should be increased to account for unknowns. From time to time, a team may find during the investigation of a bug that major refactoring is needed in order to fix the issue. In such cases, there are two approaches for consideration.
- Identify a solution which mitigates the impact of the issue and can be implemented in the short-term.
- Identify a longer-term solution, which may involve more work but will prevent further occurrences of the issue while also offering benefits such as extensibility and scalability.
The right approach really depends upon the constraints the team is faced with at the time. If the team is marching towards a date-driven deliverable, then A can be implemented in the immediate term with B planned for post-release. Otherwise, B should be the goal. Depending upon how much work is involved in B, it may make sense to create an epic for B and break down the work into smaller chunks, keeping story sizes (and thus complexity of each story) manageable. Personally, I like to see work broken down such that the complexity per story is less than 21. This not only keeps complexity more manageable for the team, but it also reduces risk associated with larger chunks of work.
Regardless, it is always advisable to maintain a certain amount of capacity each sprint for addressing bugs. This reduces impact should unplanned outages or P0 issues arise in the middle of a sprint that need to be pulled into the sprint.