The C++ Sucks Series: petrifying functions

September 4th, 2009

Your IT department intends to upgrade your OS and gives a developer a new image to play with. The developer is generally satisfied, except there's this one program that mysteriously dumps core. Someone thoughtfully blames differences in system libraries.

Alternative prelude: you have this program and you're working on a new version. Being generally satisfied with the updates, you send the code overseas. They build it and it mysteriously dumps core. Someone thoughtfully blames differences in the compiler version.

Whatever the prelude, you open the core dump with `gdb app core` and gdb says:

#0  0x080484c9 in main (argc=Cannot access memory at address 0xbf3e7a8c) at main.cpp:4
4    int main(int argc, char** argv)
(gdb)

Check out the garbage near "argc=" – if it ain't printing garbage, it ain't a C++ debugger. Anyway, it looks like the program didn't even enter main. An alert C++ hater will immediately suspect that the flying circus happening in C++ before main could be at fault, but in this case it isn't. In fact, a program can be similarly petrified by the perspective of entering any function, not necessarily main. It's main where it crashes in our example because the example is small; here's the source code:

#include <stdio.h>
#include "app.h"

int main(int argc, char** argv)
{
  if(argc != 2) {
    printf("please specify a profile\n");
    return 1;
  }
  const char* profile = argv[1];
  Application app(profile);
  app.mainLoop();
}

On your machine, you run the program without any arguments and sure enough, it says "please specify a profile"; on this other machine, it just dumps core. Hmmm.

Now, I won't argue that C++ isn't a high-level object-oriented programming language since every book on the subject is careful to point out the opposite. Instead I'll argue that you can't get a first-rate user experience with this high-level object-oriented programming language if you don't also know assembly. And with the first-rate experience being the living hell that it is, few would willingly opt for a second-rate option.

For example, nothing at the source code level can explain how a program is so shocked by the necessity of running main that it dumps a core in its pants. On the other hand, here's what we get at the assembly level:

(gdb) p $pc
$1 = (void (*)(void)) 0x80484c9 <main+20>
(gdb) disass $pc
Dump of assembler code for function main:
0x080484b5 <main+0>:    lea    0x4(%esp),%ecx
0x080484b9 <main+4>:    and    $0xfffffff0,%esp
0x080484bc <main+7>:    pushl  -0x4(%ecx)
0x080484bf <main+10>:    push   %ebp
0x080484c0 <main+11>:    mov    %esp,%ebp
0x080484c2 <main+13>:    push   %ecx
0x080484c3 <main+14>:    sub    $0xa00024,%esp
0x080484c9 <main+20>:    mov    %ecx,-0xa0001c(%ebp)
# we don't care about code past $pc -
# a screenful of assembly elided

What this says is that the offending instruction is at the address main+20. As you'd expect with a Segmentation fault or a Bus error core dump, this points to an instruction accessing memory, specifically, the stack.

BTW I don't realy know the x86 assembly, but I can still read it thusly: "mov" can't just mean the tame RISC "move between registers" thing because then we wouldn't crash, so one operand must spell a memory address. Without remembering the source/destination order of the GNU assembler (which AFAIK is the opposite of the usual), I can tell that it's the second operand that is the memory operand because there's an integer constant which must mean an offset or something, and why would you need a constant to specify a register operand. Furthermore, I happen to remember that %ebp is the frame pointer register which means that it points into the stack, however I could figure it out from a previous instruction at main+11, which moves %esp [ought to be the stack pointer] to %ebp (or vice versa, as you could think without knowing the GNU operand ordering – but it would still mean that %ebp points into the stack.)

Which goes to show that you can read assembly while operating from a knowledge base that is not very dense, a way of saying "without really knowing what you're doing" – try that with C++ library code; but I digress. Now, why would we fail to access the stack? Could it have to do with the fact that we apparenty access it with the offset -0xa0001c, which ought to be unusually large? Let's have a look at the local variables, hoping that we can figure out the size of the stack main needs from their sizes. (Of course if the function used a Matrix class of the sort where the matrix is kept by value right there in a flat member array, looking at the named local variables mentioned in the program wouldn't be enough since the temporaries returned by overloaded operators would also have to be taken into account; luckily this isn't the case.)

(gdb) info locals
# if it ain't printing garbage, it ain't a C++ debugger:
profile = 0xb7fd9870 "U211?WVS??207"
app = Cannot access memory at address 0xbf3e7a98

We got two local variables; at least one must be huge then. (It can be worse in real life, main functions being perhaps the worst offenders, as many people are too arrogant to start with an Application class. Instead they have an InputParser and an OutputProducer and a Processor, which they proudly use in a neat 5-line main function – why wrap that in a class, 2 files in C++-land? Then they add an InputValidator, an OutputFormatConfigurator and a ProfileLoader, then less sophisticated people gradually add 20 to 100 locals for doing things right there in main, and then nobody wants to refactor the mess because of all the local variables you'd have to pass around; whereas an Application class with two hundred members, while disgusting, at least makes helper functions easy. But I digress again.)

(gdb) p sizeof profile
$2 = 4
(gdb) p sizeof app
$3 = 10485768

"10485768". The trouble with C++ debuggers is that they routinely print so much garbage due to memory corruption, debug information inadequacy and plain stupidity that their users are accustomed to automatically ignore most of their output without giving it much thought. In particular, large numbers with no apparent regularity in their digits are to a C++ programmer what "viagra" is to a spam filter: a sure clue that something was overwritten somewhere and the number shouldn't be trusted (I rarely do pair programming but I do lots of pair debugging and people explicitly shared this spam filtering heuristic with me).

However, in this case overwriting is unlikely since a sizeof is a compile time constant stored in the debug information and not in the program memory. We can see that the number will "make more sense" in hexadecimal (which is why hex is generally a good thing to look at before ignoring "garbage"):

(gdb) p /x sizeof app
$4 = 0xa00008

...Which is similar to our offset value, and confirms that we've been debugging a plain and simple stack overflow. Which would be easy to see in the case of a recursive function, or if the program crashed, say, in an attempt to access a large local array. However, in C++ it will crash near the beginning of a function long before the offending local variable is even declared, in an attempt to push the frame pointer or some such; I think I also saw it crash in naively-looking places further down the road, but I can't reproduce it.

Now we must find out which member of the Application class is the huge one, which is lots of fun when members are plentiful and deeply nested, which, with a typical Application class, they are. Some languages have reflection given which we could traverse the member tree automatically; incidentally, most of those languages don't dump core though. Anyway, in our case finding the problem is easy because I've made the example small.

(I also tried to make it ridiculous – do you tend to ridicule pedestrian code, including your own, sometimes as you type? Few do and the scarcity makes them very dear to me.)

class Application
{
 public:
  Application(const char* profile);
  void mainLoop();
 private:
  static const int MAX_BUF_SIZE = 1024;
  static const int MAX_PROF = 1024*10;
  const char* _profPath;
  char _parseBuf[MAX_BUF_SIZE][MAX_PROF];
  Profile* _profile;
};

This shows that it's _parseBuf that's causing the problem. This also answers the question of an alert C++ apologist regarding all of the above not being special to C++ but also relevant to C (when faced with a usability problem, C++ apologists like to ignore it and instead concentrate on assigning blame; if a problem reproduces in C, it's not C++'s fault according to their warped value systems.) Well, while one could write an equivalent C code causing a similar problem, one is unlikely to do so because C doesn't have a private keyword which to a first approximation does nothing but is advertised as an "encapsulation mechanism".

In other words, an average C programmer would have a createApplication function which would malloc an Application struct and all would be well since the huge _parseBuf wouldn't land on the stack. Of course an average C++ programmer, assuming he found someone to decipher the core dump for him as opposed to giving up on the OS upgrade or the overseas code upgrade, could also allocate the Application class dynamically, which would force him to change an unknown number of lines in the client code. Or he could change _parseBuf's type to std::vector, which would force him to change an unknown number of lines in the implementation code, depending on the nesting of function calls from Application. Alternatively the average C++ programmer could change _parseBuf to be a reference, new it in the constructor(s) and delete it in the destructor, assuming he can find someone who explains to him how to declare references to 2D arrays.

However, suppose you don't want to change code but instead would like to make old code run on the new machine – a perfectly legitimate desire independently of the quality of the code and its source language. The way to do it under Linux/tcsh is:

unlimit stacksize

Once this is done, the program should no longer dump core. `limit stacksize` would show you the original limit, which AFAIK will differ across Linux installations and sometimes will depend on the user (say, if you ssh to someone's desktop, you can get a lesser default stacksize limit and won't be able to run the wretched program). For example, on my wubi installation (Ubuntu for technophopes like myself who have a Windows machine, want a Linux, and hate the idea of fiddling with partitions), `limit stacksize` reports the value of 8M.

Which, as we've just seen, is tiny.

1. Barry KellySep 4, 2009

8M for a stack is not tiny in 32-bit land, in fact it's very large. This important in practice if you use many threads, as the preallocated stacks will use all your address space, if not actual committed memory. But I assume you're running 64-bit, as 32-bit generally defaults stack to 1M or so.

2. astrangeSep 4, 2009

One of the gcc 4.5 projects (recently commited) adds perfect debuginfo for optimized programs with not much more compile time. People who want backtraces with -fomit-frame-pointer are still stuck, though.

3. snSep 4, 2009

Doesn't your main function need to return a value at the end?

4. Yossi KreininSep 5, 2009

@astrange: Actually, it's not that hard to get a backtrace under -fomit-frame-pointer; perhaps I'll blog about it. The Green Hills debugger does that and I think the compiler stores things in the debug info for this to work, however heuristic scripting can do quite well given just a symbol table and a disassembler.

@Barry: I didn't mean to say tiny, I meant to say <sarcasm>tiny</sarcasm> – but you know how these blogging engines mess with your markup. Regarding the preallocation of stacks for threads – so what happens under `unlimit stacksize' then? (I'd assume that Linux doesn't really preallocate physical memory for stacks but it does have to preallocate virtual addresses since they ought to be contiguous, therefore there can be no such thing as unlimited stack size, even if unlimited means "until we run out of bits to encode pointers", because if there are many threads, you need to divide the address space between them, and Linux doesn't even know how many threads a program will use concurrently; so `unlimit stacksize' would simply mean to use some large limit built into the system.)

@sn: it doesn't.

5. AntoñicoSep 5, 2009

It's pretty dumb to create such big instances in the stack. The stack is for temporary and little operations, to avoid overhead of heap memory allocation, as allocating memory in the stack is just a stack pointer adjustment (e.g. the fastest possible memory allocation).

C++ is very criticable, but come on, this post is just ignorance.

6. Yossi KreininSep 5, 2009

> It’s pretty dumb to create such big instances in the stack.

No shite, Sherlock!

7. EntitySep 5, 2009

> It’s pretty dumb to create such big instances in the stack.

Yes but the functionality is there to do so, though the problem is that there is no standard or common way between compilers to detect such events occuring.

I think the issue with allocating on the stack v heap is a red herring. The problem is that their is not a fast effective way of detecting such problems unless your experienced with assembly and the technical details of the envrioment the application is running in.

Though, sometimes less is more and in this situation most customers will have no idea what the problem is and wouldnt really care.

Thank you Yossi Kreini for another post. Allways look forward to see what new things you run into.

Keep it up.
Chad.

8. SBSep 5, 2009

Using the stack to allocate more than 16K is considered dumb in many places.

9. Doug the newby C++ dudeSep 5, 2009

So do any lint-ish tools detect the issue? Did you get any warnings when you compiled your Application class? I'm always leery when I snarf a big block of space. Where did [1024][10] come from?

10. Doug the newby C++ dudeSep 5, 2009

er, [1024][1024*10]

11. Yossi KreininSep 5, 2009

What should LINT or the likes of it do – warn about the size of your objects when it exceeds what threshold? 16K the universal dumbness constant from the comment above?

No, I didn't get any warnings, in fact this example was inspired by a program that allocated >300M using a similar 2D array and AFAIK it compiled under gcc -Wall -Werror.

12. Vladimir LevinSep 9, 2009

I am a bit confused. In the actual app where this problem originally came up, was that the first time someone had added the private attribute with the array size associated with it, or did this somehow work with an earlier version of C++?

13. Yossi KreininSep 10, 2009

There was, sadly, more than one actual app where I saw this problem. The example with the huge member array preventing you from entering main() is close to a particular case, where the problem didn't come up until someone ssh'd to someone else's machine and ran the program (which previously ran fine for what I think was months). Presumably the default stack size limit varied with the user/machine combination.

This is unrelated to "versions of C++" (be it the language standard or the compiler version that you refer to) where the OS sets the stack size limit. It can depend on compiler version if the stack is effectively allocated statically by the compiler (actually a linker script shipped with it), as can be the case in embedded systems; although there you frequently maintain your own linker script anyway.

14. FrankSep 12, 2009

Your criticisms of C++ are unfair do to the fact that they have nothing to do with C++.

15. Yossi KreininSep 12, 2009

Where else could you encounter similar behavior?

16. SeanSep 30, 2009

You would have the exact same issue in C and any other language that allows allocations on the stack with no runtime checking:

#define MAX_BUF_SIZE 1024
#define MAX_PROF 1024*10
struct Application {
char* _profPath;
char _parseBuf[MAX_BUF_SIZE][MAX_PROF];
};

C++ has plenty of warts, but this isn't one of them.

17. Yossi KreininSep 30, 2009

RTFA. In C you'd be very likely to malloc the forward-declared struct – C encapsulation encapsulates much more than C++'s does.

18. Yoed StaviOct 14, 2009

You said:

"Alternatively the average C++ programmer could change _parseBuf to be a reference, new it in the constructor(s) and delete it in the destructor, assuming he can find someone who explains to him how to declare references to 2D arrays."

I'll take that challenge :)
is there anything non-standard with my solution?
are there any problems with my implementation of
class Application that weren't there to begin-with?

——————————————————–
class Profile;

class Application
{
public:
Application(const char* profile);
~Application();
void mainLoop();
private:
static const int MAX_BUF_SIZE = 1024;
static const int MAX_PROF = 1024*10;

typedef char arr2D_t[MAX_BUF_SIZE][MAX_PROF];

const char* _profPath;

void *rawMem;

// here's the damn ref to a 2D array.
arr2D_t& _parseBuf;
Profile* _profile;
};

Application::Application(const char *profile):
rawMem(new char[sizeof(arr2D_t)]),
_parseBuf(*reinterpret_cast(rawMem))
{
}

Application::~Application()
{
delete [] &_parseBuf;
}

int main()
{
Application myApp("foo");
// ran fine on my machine...
return 0;
}

19. Yoed StaviOct 14, 2009

the reinterpret_cast in my version had the necessary
&lt arr2D_t * &gt

the comment system ate it...

I hope it makes more sense now.

20. Yoed StaviOct 14, 2009

how do you make triangular braces on this forum anyway?

21. Yoed StaviOct 14, 2009

OK, I think I figured it out,
it was supposed to look like this:

*reinterpret_cast<>(rawMem)

22. Yoed StaviOct 14, 2009

no I didn't get it... giving-up now

23. Yossi KreininOct 14, 2009

@Yoed: pretty much along the lines of what I thought; actually you don't need to keep rawMem as you can delete &_parseBuf in the destructor. A large share of C++ programmers wouldn't know how to typedef the 2D array type.

24. Yoed StaviOct 15, 2009

I tried first without the rawMem.
it cries about needing a temp variable in the initialization phase. so I gave it perm variable instead.

if you'll notice, I do use &_parseBuf in the destructor.

btw, having to still use delete [] (with the square braces) is a pitfall that's easy to miss IMO.

25. S.H.Oct 27, 2009

I'm Sherlock. Did anyone asked for my assistance?

ps/ Oy! I typed "yes" or "y" in the "Human?" field and got some machine-like language telling me to do what I just did. Brainless machines...

26. Yossi KreininOct 27, 2009

@Sherlock: no shit, Sherlock! You'd be better off if you didn't strip non-alphanumeric characters before parsing natural language.

27. ChrisJun 30, 2010

Note: In Fedora limit doesn't exist, I used this to get the stack size:
ulimit -s

28. AndiMOct 9, 2010

@Chris:
That's not a "Fedora vs. the world" issue -
the author took care to write it as "Linux/tcsh".
Thus, just trade your bash (which as you said has the ulimit builtin) for something usable in the context of this article.

29. rmnMay 6, 2011

@Yoed: I guess you'd want to avoid reinterpret_cast (portability issues), and use static_cast(static_cast(rawMem)) instead – which is well defined.

@Yossi: I liked the debug process you did there, but wouldn't a simple glimpse at the code given you the answer already?
I am asking because I find people are split to two groups regarding debugging:
- Those that just go ahead and gdb the hell outta their binary as soon as a problem is encountered,
- And those (myself included, I must admit), that rather re-read the code a few times, and then use printf debugging, all just to avoid gdb :)

As with everything, professionals must balance both of these approaches.

30. Yossi KreininMay 6, 2011

@rmn: well, in my example you could probably see the problem since it was really small, but in a larger program, when the program crashes at a place where you don't really expect it to crash no matter what data it runs on (that is, no pointer dereferencing, no division, etc.), I don't see how reading code would help. The large object could be (and in reality, will be) defined deep down in some member of a member of a...

Of course if you've already seen a similar bug, you'll be looking for a large object – but if you haven't, reading source doesn't really point you to the problem very directly since nothing interesting happens at the source level. A similar and perhaps nastier case is when you get a misaligned stack pointer and crash at some place where the compiler used an x86 vector instruction that only works with aligned addresses. There's absolutely nothing wrong at the source level so you have to disassemble to make any sense of what you see. (The difference here is that it's not the fault of standard debugging tools, really, since standard compilers will set up the stack properly; this is the fault of whoever glued shared libraries built with different calling conventions together or similar – although the tools could do at least something to detect an ABI mismatch and they don't. Whereas in my example, it's pretty standard and widely encouraged C++ style – "local variables are the best" – and then you get no help at all from the tools to figure out what went wrong.)

31. HumanAug 19, 2011

There's this one thing I don't understand and it itches.

"RTFA. In C you’d be very likely to malloc the forward-declared struct"
Why wouldn't one be just as likely to "new" the class in C++?

32. Yossi KreininAug 19, 2011

Well, obviously nothing prevents one from new'ing an object rather than stack-allocating – it's a question of what the preferred style is.

In C, "encapsulation" generally means ADTs – forward-declared structs implemented in a .c file, hiding their internals to the point where you can't allocate an object yourself since you don't know its size – so you call an allocation function provided in the .h file for the ADT.

In C++, "encapsulation" generally refers to the "private:" keyword – a class keeps its implementation details in the "private:" section, and the compiler can thus allocate an object anywhere where the .h is included, since it knows the object size, the price being that it also has to read the .h files describing the classes of the members. The common C++ style encourages allocation on the stack, both because its more efficient than allocation on the heap and because it's the most straightforward way to get RAII-style resource management.

Again, it's not about what's possible but about what's likely; the commonly advocated and used style affects the likelihood if not the possibilities.

33. Dmitry RubanovichMar 3, 2012

Instead of this:
char _parseBuf[MAX_BUF_SIZE][MAX_PROF];

you can do the following (and allocate on the heap instead of stack):

std::vector _parseBufVec(MAX_BUF_SIZE*MAX_PROF);
typedef char _ParseBuf[MAX_BUF_SIZE][MAX_PROF];
_ParseBuf & _parseBuf = *reinterpret_cast(_parseBufVec.data());

reinterpret_cast should make the two views on the same memory act as if they were
two entries in a union. I believe that was the original intention of reinterpret_cast.

34. Dmitry RubanovichMar 3, 2012

Ok, there is no preview button, but the reinterpret cast was supposed to be like this:
reinterpret_cast<_ParseBuf*>(....
the <,> got parsed away (probably because they look like a tag).

35. Yossi KreininMar 3, 2012

Yeah, that's one way to do it, although I'd probably typedef the 2D array type to something and then new/delete it myself instead of using std::vector and reinterpret_cast, just so that there's a teeny bit less of tasty C++ goodness involved.

36. Dmitry RubanovichApr 4, 2012

Well, if you do all the new/delete on your own, you risk leaking if anything throws and something in a higher stack frame catches it. You then need to start wrapping your allocation/deallocation in ctor/dtor; or to use smart pointers. Before long, you'll be writing quite a bit more code than a simple (ok, that's definitely a euphemism) cast would have been.

37. Yossi KreininApr 4, 2012

Much of the code I work on is compiled without exception support, precisely to not care about this sort of thing.

38. Ray Allen GebhardtMay 12, 2012

Why wouldn't you just do something like the following instead?

Application app = new Application(profile);
app->mainLoop();
delete app;

That sounds like a far better fix. And just so you know, you can run into problems with the stack in pretty much any programming language that exists. This issue doesn't just exist in C++.

39. Yossi KreininMay 12, 2012

You could do that, as I mentioned. As to other languages – most allocate less on the stack and report stack overflows more gracefully.

40. José PedroMay 25, 2012

Hi've got an easy solution for you: just move the app declaration out of main. That way that variable will be statically allocated by the linker in the data section, and you'll have no problem with the stack.

I do like C++, altough not the metatemplate interpreter builtin, and having spent the last few days reading your FQA, I'm trying not to sound like a C++ apologist...

Incidentally, 64-bit Linux also allocates 8M for the stack by default, so this isn't a 32-bit vs 64-bit issue.

41. José PedroMay 25, 2012

Yep, ended sounding dumb instead, profile is a parameter for the constructor.

My other solution would be like Ray's one, above.

42. Yossi KreininMay 26, 2012

Well, sure, there are plenty of workarounds, once you figure out the problem – which many, many competent C++ programmers really can't.

43. SteveJun 17, 2012

For certain values of competent.

44. Tony KaoAug 1, 2012

I've seen lots of C programs (particularly vendor supplied "examples" for microcontrollers) that stack overflow spectacularly because they thought to allocate the entire data buffer on the stack, and have the declaration buried deep within nested loops and switches.

This is certainly not a C++ problem, more of a brain-dead developer problem.

45. KronosJul 18, 2013

Yes, a matrix was to large to statically allocate, what is the limit on statically allocated variable ?

46. BaccJan 22, 2014

Well explained. I was getting the same error in a legacy C code and I never considered it to be a stack overflow.

The C++ Sucks Series: petrifying functions

Post a comment