Here’s a stumper.  The programme I’m writing crashes and it’s my job to fix it.  There’s one good thing about this crash:  it’s easy to reproduce–four simple steps and it crashes every time.  The really tough bugs are those that only crash when the moon is full and the tide is going out.  But there’s nothing else easy about this bug.

First off I’ve already spend two full days on it.  Two days of slow and painful debugging earlier this week, and I thought I’d found the problem.  I gleefully told everyone I had the bug licked, but another programmer in the office showed me a case where it still crashed.

Secondly it only happens with a “release build”.  That needs explanation for non-programmers.  When I’m programming I can build my code as a “debug build” or a “release build”.  The debug build is really quick to build and has lots of extra goodies in it that make it easy for me to see exactly what’s happening under the hood while it runs.  But it runs really slowly so we’d never give it to your users.  The release build is optimized–makes for much smaller and faster .exes.  In theory a release build and a debug build do exactly the same thing, so if one works the other should to.  But once in a while you get bugs like this one, where it works in debug but not in release. 

That means I can’t use all my fancy debugging tools to examine the state of memory and see what lines of code are being run.  That’s a big handicap.  But not the end of the world.  I can still hook up my debugger to a release build and get some information about what’s happening.

But this crash only happens when the debugger is not running.  Ouch!  If I run the release build through the debugger, it all runs fine.  If I don’t use the debugger, it crashes.  I feel blind without my debugger.  This is really getting harry.  I remind myself that life is an adventure.

This doesn’t always work, but on the off-chance that it will I try running the application with no debugger, doing three of the four steps, then attaching the debugger before doing the fourth.  To my relief, it still crashes, only this time I’m in the debugger so I can see the exact line of code it’s crashing on, and look at the state of memory to hopefully understand why.  Surprisingly it’s in a completely different part of the code than I expected.  And I realise soon enough why–the memory is all messed up, the bug must have happened elsewhere and that sent the code off on a wrong tangent. 

The debugger won’t be much help after all, I’m still no closer to understanding where the crash is happening, let alone why.

I quickly confirm my theory that this isn’t where the real problem is, because a very minor and innocuous change to the code moves the crash somewhere else.  At least this “somewhere else” is closer to where I’d have expected something to go wrong, so for the moment I’ll assume I’ve found the area, and concentrate on finding the exact line of code.

I don’t have 20 years of programming experience for nothing!  In the good ol’ days we didn’t have debuggers.  I dust off my memories and try some old-school techniques. 

yo1.gifTo figure out where in the code the crash is happening, I sprinkle “MessageBox”es liberally throughout the area I suspect.  A MessageBox is just a little dialog box that pops up with a message of my choice and an OK button.  Most of the time I just number them, so the message is “yo 1” and “yo 2” and so on.  That way I can run the application and at the same time look at the code and see where it is in the code.  When I do that fourth step that causes the crash, I get the “yo 1” message, hit OK, get the “yo 2” message, hit OK, and so on.  After “yo 4” it crashes without showing “yo 5”, so now I know the exact line of code.  The code itself still looks fine though, so now I need to figure out why.

And I still have this nagging suspicion that any minute something will show me I’m still in the wrong spot in code, and the only reason it’s crashing here is because something else completely unrelated got corrupted.

debugger.jpgI work with three 21″ monitors, and one of them is normally packed with debug information, showing me in real-time memory, breakpoints, trace messages, variables, registry values, all kinds of useful stuff.  I have none of that now, but I can get at it the poor man’s way with a MessageBox. yo2.gif So instead of the “yo 1” message, I add a few more boxes displaying values of certain key variables.  I can see that the line of code it’s dying on wasn’t supposed to be run at all, so I look at the values of variables that were supposed to cause that line to be skipped.

And that’s when I find it.  Eureka!  To my chagrin it’s my own bug.  Worse yet I fixed a similar bug elsewhere just last week.  And this one’s been staring me in the face all day while I painstakingly added MessageBoxes all around it.  But the relief of having found it and being able to move on is stronger than the depression at having caused it.  Isn’t it great when you can make a mistake and still feel good about fixing it?

One more day and one more bug behind me.  That is the sometimes-life of a programmer.