"Never ending stream of bugs" is a surprisingly common problem in many software development shops.
It might seem like an intractable problem, but it’s actually not.
It’s solvable, but you have to tackle it on several different fronts:
Application Specs
Code Architecture
QA Automation
Development Environment
Most bugs happen because:
Specs are not well understood
Architecture is confusing
Most bugs can't get fixed because:
Manual testing is near impossible
(1) Application Specs
If no one knows what the program is supposed to do, forget it. Bugs will never get fixed, and QA will never get automated. This is what you should focus on first.
Specs should be clear but not verbose. No one wants to read half a page when a sentence would suffice.
Do not let business people write them! The chief architect or tech lead should be writing them.
Specification documents are an artifact that engineering is responsible for.
The specs will derive both the code architecture and QA automation.
(2) Code Architecture
The ground truth about computers is they can only do two things: (a) Process data (b) Move it around (I/O)
Code Architecture should reflect this. No crazy abstractions. No service providers. No business domain.
Do not model the problem. Model the solution.
Programmers need to understand how data is flowing through the system. It should be crystal clear. No obfuscation.
Every little thing you add to obfuscate data flow will be a time bomb waiting to explode into an intractable bug.
(3) QA Automation
Forget about unit tests. They are mostly a waste of time.
Allocate time and resource for an automated end-to-end suite.
How do you know whether the tests are good? Here are some heuristics:
They simulate accurately how a user interacts with the program/system
They do not deal with system state. Only inputs that a user can give and outputs that a user can see.
Zero mocks! The test executes all the relevant code paths in the system in exactly the same way that would happen in production.
Never once do you need to change the test as a result of refactoring the system internals. Tests only change when you make changes to the user interactions (inputs/outputs).
Given the above, it's easy to fix bugs with confidence.
You can check the spec to see what's the expected behavior of these other things
You can run the test suite to see if anything broke as a result of your fix
But there’s one last piece to the puzzle!
(4) Development Environment
All the above is almost useless if it's not easy for anyone on the team to run the QA test suite with one button (or one simple command).
Local development should be seamless. Every single programmer on the team should be able to run the entire system with one simple command. They should have a quick edit/run/debug cycle.
QA should also be trivially easy to run on staging too.
When you don't have all these points taken care of, an endless stream of bugs is just par for the course. There's nothing you can do about it unless you fix the root cause.