Interrupt? Let the Bastard handle it!

January 20th, 2008

I've promised to consider publishing an objective and comprehensive coverage of the adventures of The Bastard. The public interest generated by this promise, combined with an optimistic estimation of the legal risks involved, lead me to the decision to Yes, publish it.

Myself, I always kinda missed this sort of thing. We've got Bastard Operator From Hell, but what about a Bastard Programmer? Sure, we have, but it's primarily about incompetence. What about The Pure Evil? Of course, compared to a programmer, an evil sysadm has much more interesting opportunities to practice his evil craft, for he's got root access, and what have you got, my poor helpless protagonist? You ain't got nothing! With a programmer, you can at least read and change the code. Well, actually you can't, but you can try. The point is that someone unfamiliar with the subject won't see that the protagonist was doomed all along. But this excuse doesn't excuse anything. Isn't the struggle between Good and Evil the whole point of drama, and isn't it much more interesting when Good is given a chance to rise from the ashes and kick Evil's butt?

Enough introductions. Ladies and gentlemen, I'm proud to present you The Adventures Of The Bastard, Part I!

Disclaimer: The story is written in first person; think of The Real Me as a second person. Similarly, the Bastard is a fictional character, though he is based on several different non-fictional, full-fledged real-life bastards. Function names and hexadecimal addresses have been changed to protect the guilty.

We've got this new board, which apparently works (no signs of smoke). Now I'm supposed to run a little app on this board to try it out. Not the full-blown thing, just a small part of it. Um, did I just say "small"? Let me see... 4 megabytes. What the hell is going on? Let's look at the symbol table, shall we? __i5_stdZQostream_put. Mmmm, crunchy. Wait a minute... Pipe through awk, get the symbol names, pipe through sort. Well, what do you know. Apparently one __i5_stdZQostream_put is not enough. No, sir, we need about... I could pipe it through awk again and make a histogram... Oh, screw that. We need about 5, 10, 15... about 50 of those. Groo-vy. C++. Surpassing the safety of C and the memory efficiency of Java. What's the word for "surpass from below"? I don't quite remember. Maybe "suck"? Oh, Useful Application, why are you not written in some other language? I bet you don't know, either. Oh well, enough talking. I should run along now... More accurately, you should run, right here on this board. Where's my JTAG probe?

JTAG probes. Awesome gadgets. You just grab this thing by the pins and shove it right into your board. And then you tell your chip, "HALT!!". And it halts. Well, actually, only the processor halts, and the other pieces of hardware keep happily crunching away. But at least the processor halts. For real. No "background processes", no nothing. Stops right there on the spot. None of this Can-I-pleeease-ptrace-that-process and No-you-have-no-permissions kind of nonsense. When I say HALT, everything HALTS! And then I can put my dirty little hands on each and every frigging bit of this massive, steaming pile of electronic devices. If the bit in question is memory-mapped, of course. Which it isn't, not when you most need it... But at least there's none of the so-called "memory protection" bullshit. Well, actually, we do have some of that. The debuggers are so dumb that if your processor has some kind of a retarded memory management unit and you have the code memory marked as read-only, it can't place breakpoints there. It can't even load code, for that matter. But we can manually overwrite the permission bits, can't we? Yes, we can!

We. Me and my probe. For the life of me, I couldn't do that alone. Oh, I miss my probe so much. Nothing personal, I just need to run the 4 megs on the board, you know. How am I supposed to load it, bit-by-bit with my bare hands, I ask thee? Where's my frigging probe, dammit? Oh, there it is. OK, go ahead and grab a $10K worth of hardware from my desk, right by the pins, if you really have to. But while you're at it, why not leave a note where you say goodbye and wish me the best of luck on behalf of the probe, if you know what I mean? All right. Careful next time.

telnet darth. Shut up. I didn't register these stupid probes in the DNS. The Bastard did. We also have a luke, but luke doesn't work with multiple JTAG targets in the same chip, do you, luke? Luke, when gone am I, the last of the Jedi will you be. No answer. Probably because it's unplugged. Then again, there are other reasons. I need a vacation. What does darth have to tell us? Looks like someone has played with the frequency. Let's raise it back. We have a whole lot of __i5_stdZQostream_put to upload. Must be careful though. What was the magic maximal JTAG frequency, 20% of the target clock speed? Not much, considering the blazing 100% speed of this embedded mircocrud and its ilk. And JTAG, my friends, transfers bit-by-bit. OK, time to quit whining. We're up and running, and pumping those bits like crazy. Yawn. This is slow. 87%-93%-99%... Finally.

Run, run, run, CRASH. Awesome. It's a good thing we've got ourselves a Graphical Debugger. It's also a good thing that I can't buy weapons without a license where I live. Or else you'd witness Graphical Violence right now. Because the stack is quite expectedly smashed to little bits, and the global variables, my only hope to make any sense of this, are defined in C++ namespaces. And the namespace support in our Graphical Debugger is really the way to go. If you want to go postal, that is. Basically, it boils down to three options:

  1. view Stupid::g_moronic doesn't work, but double-clicking at the point of definition does.
  2. view Stupid::g_moronic does work, but double-clicking at the point of definition doesn't.
  3. view Stupid::g_moronic doesn't work, and neither does the double-clicking.

If you wonder how the particular sort of wrong behavior is selected by the bleeding-edge tool, the only thing I can tell you is that you've got company. I wonder how it chooses the way to screw me, too. Well, actually, I stopped wondering long ago, and what I do is I try to click, no it's not option 1, I try to view, no it's not option 2, I say "FUCK!!" because it's option 3, the one where nothing works, so I filter nm through grep g_moronic, find the address and do a view *(MoronicClass*)0x70066584, et voila! The pile'o'crap is visualized in all its glory right in front of our aching, grepping eyes. You know what I like about global variables? Even C++ can't completely fuck them up; you still find them. And you know what I don't like? I don't like the value of _area here, 'cause that's an impossible value. Rectangles don't have negative areas, unless they are imaginary rectangles. Why do you imagine rectangles, you fatty 4-meg piece of garbage? Why not run properly instead? Maybe some running could help you get back into shape, loose a couple of those kilobytes or something.

Who crapped all over my _area? Let's browse the lvalue references, which takes a right click and a menu item selection. This debugger isn't at all bad, I'm telling ya. If it wasn't for those cplusplusisms in the code... Here's my lvalue ref list. Click, click, click. All places look sensible, as far as these things go. Unless of course when width is multiplied by height over here, one of them is negative. I could keep backtracking, but I've clicked enough. Let's add some good old-fashioned printf statements.

Compiiiiile. Liiiiink. Ruuuuun. Yawwwwwn. Nothing like the C++, NFS and JTAG trinity to put me into this caaaaalm mood. Wait, what's that?.. SHIT! All that forced meditation for nothing. It crashes differently this time. OK, put yourself together. It's no big deal. Been there many times. Shit happens. You know, uninitialized values which are uninitialized differently every time, race conditions and who knows what else. Let's print some more stuff.

Edit-compile-link-run-yawn-SHIT. Edit-compile-link-run-yawn-SHIT. Edit-compile-link-run-yawn-SHIT. Crash, crash, crash. It prints nonsense alongside fairly reasonable stuff, and crashes. Wait a second. Crashes? At an instruction comparing two registers? You can't crash just by crunching registers. You crash on memory access. And on divide by zero, if your processor can divide, which mine can't. If you want to crash, you better get yourself some bad pointers, or, if you can't find any, you'll need outright illegal instructions. Illegal instructions?! What's that smell in the air? Fire a memory view. View address: $pc. Compare the bits to the hexadecimal from the interlaced assembly listing. Guess what. They're different. And the stuff in the memory view doesn't disassemble very well, either. Somebody crapped over my instructions. Congratulations, somebody! I wonder who you are. Can't wait to find out.

Add a hardware watchpoint and run again. I sure hope it's the CPU and not a DMA controller or something. I sure hope it will be at the same address it was the last time. Bingo! And where is $pc? 0x97000176. No symbols. Of course โ€“ it points right into the flash. Bastardland!

What was that address where the Bastard's code defecates? Right, 0x76ffff56. Must be the location of one of his precious variables. You see, the Bastard wrote the boot loader for this board. A boot loader is what you burn to the flash, and what it does is it loads applications burnt somewhere else in the flash, and it handles interrupts. Because when you interrupt this processor, it jumps somewhere near the address where it starts running on power up. So the basic interrupt handling code naturally lands in the boot loader. And the boot loader happened to equally naturally land in the Bastard's lap. It is thus the Bastard's job to set up the stack pointer for the interrupt handling functions. And if there's one thing you can count on the Bastard to do, it's setting things up. He'll set them up, all right.

Keep shoveling through the assembly code. The Bastard clearly likes NOPs. Like, say, 20 in a row. Here and there and over there. Macros, he does not like. Repetition is at the core of learning, among other things. If we need NOPs, well, we'll use NOPs. NOP NOP NOP. And then hex hex hex. Oh, the Bastard loves hexadecimal. 0x77000000. And where does this go? Right here into $sp. The stack pointer for the interrupt mode. The stack then grows downwards, crapping over the instructions at 0x76fff... Bastard.

When you decide to carve out a chunk of memory to play with, it would at least be nice to put it in the linker script, so that code isn't allocated to the spot where interrupts are serviced. Better yet, why not have the app tell the boot loader where it likes its interrupt stack? It does provide an interrupt handler, by a clever method devised by the very same Bastard. Why is the stack pointer treated differently from the function pointer? Simple: the Bastard actually needed to have the boot loader call his particular function. Convincing the boot loader to jump to wherever the linker had put that function was easier than convincing the linker to put it where the boot loader jumps. The stack location? The Bastard couldn't care less where that was; he just stashed it somewhere past the end of the image of his tiny test program.

See, the Bastard is very goal-oriented and only implements the critical features. Translation: the Bastard only implements what the Bastard needs to report Success to whoever signs his paycheck. The relativity of Success is another matter, and one that the Bastard doesn't give a flying fuck about. The Bastard is like a QA guy, reversed. I once heard a QA guy say to a developer, "Where you people fail, I can make a living". And with the Bastard, it's "Where I can make a living, you people will fail".

And a quiet Bastard he is. Has anybody ever heard of 0x77000000? No, sir. The Bastard doesn't make much fuss about his mission-critical work. Well, better stick to that spirit, and write an e-mail to someone else. "We need the boot loader to..." Yeah, right. Maybe it will happen in a month or six. The thing is already burnt to quite some flash cards. Firmware. Firmware is not software. You don't just fix it. You beg the Bastard to fix it.

And in the meanwhile, let's make a nice big hole in the linker script, right there at the interrupt stack location. This sections goes below the hole, this section goes above. Oops, that doesn't link; the variables won't fit between the end of the hole and the end of the physical memory. OK, then this section goes below, and that section goes above. All right! The pseudo-small 4M app runs just like a tiger with his rear end on fire. Until someone adds one function too much. Or one variable too much. Must not break the delicate balance between the above-hole sections and the below-hole sections. Watch out for The Bastard's Black Hole, folks!

1. My reflections on technology ยป Looking through the eye of the pigApr 23, 2008

[...] for a rant, for a reason too. I hate doing impossible work sometimes. Look at Yossi Kreinin's blog entry about a Bastard Programmer From Hell. Now that I think about it, the whole point behind programming [...]

2. LeoDec 15, 2012

Why didn't you turn all interrupts off or why didn't you replace interrupt handling code with your own? A boot loaded is no substitute for an OS or a library with platform specific code

P.S. Sorry for reviving such an old post

3. Yossi KreininDec 15, 2012

Well, the way it worked was, the hardware would jump to the Flash, and then that code did stuff and then it called your code. You couldn't ask the CPU for anything else โ€“ more specifically, you had to choose between ROM and Flash.

As to turning off interrupts โ€“ the code wouldn't run, not very surprisingly. In particulars, there were coprocessors which told you they were done through an interrupt. You could do polling there and in other places but generally if you have a big piece of code which is essentially its own OS it's not easy to make it work with interrupts off unless you design it that way throughout.

4. LeoDec 18, 2012

Well then it looks like what you were dealing with wasn't a boot loader if you were counter on "boot loader" to be alive after it did boot your application.

Anyway the main problem (imho) here as such mandatory thing like a protocol/policy on communication with your "boot loader" wasn't well defined & wasn't enforced...

5. Yossi KreininDec 18, 2012

Erm, well, yeah; the question is who should do what. I think that if my job is to make a piece of software X, then when I report that I'm done, one of the things that I've done is I defined protocols to talk with everything I have to talk to, and I discussed this with everyone responsible for everything I have to talk to, and we're in agreement and we're aware of what talks to what and how. An alternative way of looking at it is, this is some manager's responsibility, and then when something like this happens, it's the manager's fault (actually it's always is anyway, by definition; but the way I look at things is, if the manager delegates this to me and I leave loose ends like this then it's the manager's fault is to have trusted me and not much else.)

Post a comment