The C++ Sucks Series: petrifying functions
Your IT department intends to upgrade your OS and gives a developer a new image to play with. The developer is generally
satisfied, except there's this one program that mysteriously dumps core. Someone thoughtfully blames differences in system
libraries.
Alternative prelude: you have this program and you're working on a new version. Being generally satisfied with the updates,
you send the code overseas. They build it and it mysteriously dumps core. Someone thoughtfully blames differences in the
compiler version.
Whatever the prelude, you open the core dump with `gdb app core` and gdb says:
#0 0x080484c9 in main (argc=Cannot access memory at address 0xbf3e7a8c) at main.cpp:4
4 int main(int argc, char** argv)
(gdb)
Check out the garbage near "argc=" – if it ain't printing garbage, it ain't a C++ debugger. Anyway, it looks like the program
didn't even enter main. An alert C++ hater will immediately suspect that the flying circus happening in C++
before main could be at fault, but in this case it isn't. In fact, a program can be similarly petrified by the perspective of
entering any function, not necessarily main. It's main where it crashes in our example because the example is small; here's the
source code:
#include <stdio.h>
#include "app.h"
int main(int argc, char** argv)
{
if(argc != 2) {
printf("please specify a profile\n");
return 1;
}
const char* profile = argv[1];
Application app(profile);
app.mainLoop();
}
On your machine, you run the program without any arguments and sure enough, it says "please specify a profile"; on this other
machine, it just dumps core. Hmmm.
Now, I won't argue that C++ isn't a high-level object-oriented programming language since every book on the subject is
careful to point out the opposite. Instead I'll argue that you can't get a first-rate user experience with this
high-level object-oriented programming language if you don't also know assembly. And with the first-rate experience being the
living hell that it is, few would willingly opt for a second-rate option.
For example, nothing at the source code level can explain how a program is so shocked by the necessity of running main that
it dumps a core in its pants. On the other hand, here's what we get at the assembly level:
(gdb) p $pc
$1 = (void (*)(void)) 0x80484c9 <main+20>
(gdb) disass $pc
Dump of assembler code for function main:
0x080484b5 <main+0>: lea 0x4(%esp),%ecx
0x080484b9 <main+4>: and $0xfffffff0,%esp
0x080484bc <main+7>: pushl -0x4(%ecx)
0x080484bf <main+10>: push %ebp
0x080484c0 <main+11>: mov %esp,%ebp
0x080484c2 <main+13>: push %ecx
0x080484c3 <main+14>: sub $0xa00024,%esp
0x080484c9 <main+20>: mov %ecx,-0xa0001c(%ebp)
# we don't care about code past $pc -
# a screenful of assembly elided
What this says is that the offending instruction is at the address main+20. As you'd expect with a Segmentation fault or a
Bus error core dump, this points to an instruction accessing memory, specifically, the stack.
BTW I don't realy know the x86 assembly, but I can still read it thusly: "mov" can't just mean the tame RISC "move between
registers" thing because then we wouldn't crash, so one operand must spell a memory address. Without remembering the
source/destination order of the GNU assembler (which AFAIK is the opposite of the usual), I can tell that it's the second
operand that is the memory operand because there's an integer constant which must mean an offset or something, and why would you
need a constant to specify a register operand. Furthermore, I happen to remember that %ebp is the frame pointer register which
means that it points into the stack, however I could figure it out from a previous instruction at main+11, which moves %esp
[ought to be the stack pointer] to %ebp (or vice versa, as you could think without knowing the
GNU operand ordering – but it would still mean that %ebp points into the stack.)
Which goes to show that you can read assembly while operating from a knowledge base that is not very dense, a way of saying "without really knowing what you're doing" – try that with C++
library code; but I digress. Now, why would we fail to access the stack? Could it have to do with the fact that we apparenty
access it with the offset -0xa0001c, which ought to be unusually large? Let's have a look at the local variables, hoping that we
can figure out the size of the stack main needs from their sizes. (Of course if the function used a Matrix class of the sort
where the matrix is kept by value right there in a flat member array, looking at the named local variables mentioned in the
program wouldn't be enough since the temporaries returned by overloaded operators would also have to be taken into account;
luckily this isn't the case.)
(gdb) info locals
# if it ain't printing garbage, it ain't a C++ debugger:
profile = 0xb7fd9870 "U211?WVS??207"
app = Cannot access memory at address 0xbf3e7a98
We got two local variables; at least one must be huge then. (It can be worse in real life, main functions being perhaps the
worst offenders, as many people are too arrogant to start with an Application class. Instead they have an InputParser and an
OutputProducer and a Processor, which they proudly use in a neat 5-line main function – why wrap that in a class, 2
files in C++-land? Then they add an InputValidator, an OutputFormatConfigurator and a ProfileLoader, then less sophisticated
people gradually add 20 to 100 locals for doing things right there in main, and then nobody wants to refactor the mess because
of all the local variables you'd have to pass around; whereas an Application class with two hundred members, while disgusting,
at least makes helper functions easy. But I digress again.)
(gdb) p sizeof profile
$2 = 4
(gdb) p sizeof app
$3 = 10485768
"10485768". The trouble with C++ debuggers is that they routinely print so much garbage due to memory corruption, debug
information inadequacy and plain stupidity that their users are accustomed to automatically ignore most of their output without
giving it much thought. In particular, large numbers with no apparent regularity in their digits are to a C++ programmer what
"viagra" is to a spam filter: a sure clue that something was overwritten somewhere and the number shouldn't be trusted (I rarely
do pair programming but I do lots of pair debugging and people explicitly shared this spam filtering heuristic with me).
However, in this case overwriting is unlikely since a sizeof is a compile time constant stored in the debug information and
not in the program memory. We can see that the number will "make more sense" in hexadecimal (which is why hex is generally a
good thing to look at before ignoring "garbage"):
(gdb) p /x sizeof app
$4 = 0xa00008
...Which is similar to our offset value, and confirms that we've been debugging a plain and simple stack overflow. Which
would be easy to see in the case of a recursive function, or if the program crashed, say, in an attempt to access a large local
array. However, in C++ it will crash near the beginning of a function long before the offending local variable is even declared,
in an attempt to push the frame pointer or some such; I think I also saw it crash in naively-looking places further down the
road, but I can't reproduce it.
Now we must find out which member of the Application class is the huge one, which is lots of fun when members are plentiful
and deeply nested, which, with a typical Application class, they are. Some languages have reflection given which we could
traverse the member tree automatically; incidentally, most of those languages don't dump core though. Anyway, in our case
finding the problem is easy because I've made the example small.
(I also tried to make it ridiculous – do you tend to ridicule pedestrian code, including your own, sometimes as you type? Few
do and the scarcity makes them very dear to me.)
class Application
{
public:
Application(const char* profile);
void mainLoop();
private:
static const int MAX_BUF_SIZE = 1024;
static const int MAX_PROF = 1024*10;
const char* _profPath;
char _parseBuf[MAX_BUF_SIZE][MAX_PROF];
Profile* _profile;
};
This shows that it's _parseBuf that's causing the problem. This also answers the question of an alert C++ apologist regarding
all of the above not being special to C++ but also relevant to C (when faced with a usability problem, C++ apologists like to
ignore it and instead concentrate on assigning blame; if a problem reproduces in C, it's not C++'s fault according to their
warped value systems.) Well, while one could write an equivalent C code causing a similar problem, one is unlikely to do so
because C doesn't have a private keyword which to a first approximation does nothing but is advertised as an "encapsulation mechanism".
In other words, an average C programmer would have a createApplication function which would malloc an Application struct and
all would be well since the huge _parseBuf wouldn't land on the stack. Of course an average C++ programmer, assuming he found
someone to decipher the core dump for him as opposed to giving up on the OS upgrade or the overseas code upgrade, could also
allocate the Application class dynamically, which would force him to change an unknown number of lines in the client code. Or he
could change _parseBuf's type to std::vector, which would force him to change an unknown number of lines in the implementation
code, depending on the nesting of function calls from Application. Alternatively the average C++ programmer could change
_parseBuf to be a reference, new it in the constructor(s) and delete it in the destructor, assuming he can find someone who
explains to him how to declare references to 2D arrays.
However, suppose you don't want to change code but instead would like to make old code run on the new machine – a perfectly
legitimate desire independently of the quality of the code and its source language. The way to do it under Linux/tcsh is:
unlimit stacksize
Once this is done, the program should no longer dump core. `limit stacksize` would show you the original limit, which AFAIK
will differ across Linux installations and sometimes will depend on the user (say, if you ssh to someone's desktop, you can get
a lesser default stacksize limit and won't be able to run the wretched program). For example, on my wubi installation (Ubuntu for technophopes like myself who have a Windows machine, want a
Linux, and hate the idea of fiddling with partitions), `limit stacksize` reports the value of 8M.
Which, as we've just seen, is tiny.
8M for a stack is not tiny in 32-bit land, in fact it's very large.
This important in practice if you use many threads, as the preallocated
stacks will use all your address space, if not actual committed memory.
But I assume you're running 64-bit, as 32-bit generally defaults stack
to 1M or so.
One of the gcc 4.5 projects (recently commited) adds perfect
debuginfo for optimized programs with not much more compile time. People
who want backtraces with -fomit-frame-pointer are still stuck,
though.
Doesn't your main function need to return a value at the end?
@astrange: Actually, it's not that hard to get a backtrace under
-fomit-frame-pointer; perhaps I'll blog about it. The Green Hills
debugger does that and I think the compiler stores things in the debug
info for this to work, however heuristic scripting can do quite well
given just a symbol table and a disassembler.
@Barry: I didn't mean to say tiny, I meant to say
<sarcasm>tiny</sarcasm> – but you know how these blogging
engines mess with your markup. Regarding the preallocation of stacks for
threads – so what happens under `unlimit stacksize' then? (I'd assume
that Linux doesn't really preallocate physical memory for stacks but it
does have to preallocate virtual addresses since they ought to be
contiguous, therefore there can be no such thing as unlimited stack
size, even if unlimited means "until we run out of bits to encode
pointers", because if there are many threads, you need to divide the
address space between them, and Linux doesn't even know how many threads
a program will use concurrently; so `unlimit stacksize' would simply
mean to use some large limit built into the system.)
@sn: it doesn't.
It's pretty dumb to create such big instances in the stack. The stack
is for temporary and little operations, to avoid overhead of heap memory
allocation, as allocating memory in the stack is just a stack pointer
adjustment (e.g. the fastest possible memory allocation).
C++ is very criticable, but come on, this post is just ignorance.
> It’s pretty dumb to create such big instances in the stack.
No shite, Sherlock!
> It’s pretty dumb to create such big instances in the stack.
Yes but the functionality is there to do so, though the problem is
that there is no standard or common way between compilers to detect such
events occuring.
I think the issue with allocating on the stack v heap is a red
herring. The problem is that their is not a fast effective way of
detecting such problems unless your experienced with assembly and the
technical details of the envrioment the application is running in.
Though, sometimes less is more and in this situation most customers
will have no idea what the problem is and wouldnt really care.
Thank you Yossi Kreini for another post. Allways look forward to see
what new things you run into.
Keep it up.
Chad.
Using the stack to allocate more than 16K is considered dumb in many
places.
So do any lint-ish tools detect the issue? Did you get any warnings
when you compiled your Application class? I'm always leery when I snarf
a big block of space. Where did [1024][10] come from?
er, [1024][1024*10]
What should LINT or the likes of it do – warn about the size of your
objects when it exceeds what threshold? 16K the universal dumbness
constant from the comment above?
No, I didn't get any warnings, in fact this example was inspired by a
program that allocated >300M using a similar 2D array and AFAIK it
compiled under gcc -Wall -Werror.
I am a bit confused. In the actual app where this problem originally
came up, was that the first time someone had added the private attribute
with the array size associated with it, or did this somehow work with an
earlier version of C++?
There was, sadly, more than one actual app where I saw this problem.
The example with the huge member array preventing you from entering
main() is close to a particular case, where the problem didn't come up
until someone ssh'd to someone else's machine and ran the program (which
previously ran fine for what I think was months). Presumably the default
stack size limit varied with the user/machine combination.
This is unrelated to "versions of C++" (be it the language standard
or the compiler version that you refer to) where the OS sets the stack
size limit. It can depend on compiler version if the stack is
effectively allocated statically by the compiler (actually a linker
script shipped with it), as can be the case in embedded systems;
although there you frequently maintain your own linker script
anyway.
Your criticisms of C++ are unfair do to the fact that they have
nothing to do with C++.
Where else could you encounter similar behavior?
You would have the exact same issue in C and any other language that
allows allocations on the stack with no runtime checking:
#define MAX_BUF_SIZE 1024
#define MAX_PROF 1024*10
struct Application {
char* _profPath;
char _parseBuf[MAX_BUF_SIZE][MAX_PROF];
};
C++ has plenty of warts, but this isn't one of them.
RTFA. In C you'd be very likely to malloc the forward-declared struct
– C encapsulation encapsulates much more than C++'s does.
You said:
"Alternatively the average C++ programmer could change _parseBuf to
be a reference, new it in the constructor(s) and delete it in the
destructor, assuming he can find someone who explains to him how to
declare references to 2D arrays."
I'll take that challenge :)
is there anything non-standard with my solution?
are there any problems with my implementation of
class Application that weren't there to begin-with?
——————————————————–
class Profile;
class Application
{
public:
Application(const char* profile);
~Application();
void mainLoop();
private:
static const int MAX_BUF_SIZE = 1024;
static const int MAX_PROF = 1024*10;
typedef char arr2D_t[MAX_BUF_SIZE][MAX_PROF];
const char* _profPath;
void *rawMem;
// here's the damn ref to a 2D array.
arr2D_t& _parseBuf;
Profile* _profile;
};
Application::Application(const char *profile):
rawMem(new char[sizeof(arr2D_t)]),
_parseBuf(*reinterpret_cast(rawMem))
{
}
Application::~Application()
{
delete [] &_parseBuf;
}
int main()
{
Application myApp("foo");
// ran fine on my machine...
return 0;
}
the reinterpret_cast in my version had the necessary
< arr2D_t * >
the comment system ate it...
I hope it makes more sense now.
how do you make triangular braces on this forum anyway?
<>
OK, I think I figured it out,
it was supposed to look like this:
*reinterpret_cast<>(rawMem)
no I didn't get it... giving-up now
@Yoed: pretty much along the lines of what I thought; actually you
don't need to keep rawMem as you can delete &_parseBuf in the
destructor. A large share of C++ programmers wouldn't know how to
typedef the 2D array type.
I tried first without the rawMem.
it cries about needing a temp variable in the initialization phase. so I
gave it perm variable instead.
if you'll notice, I do use &_parseBuf in the destructor.
btw, having to still use delete [] (with the square braces) is a
pitfall that's easy to miss IMO.
I'm Sherlock. Did anyone asked for my assistance?
ps/ Oy! I typed "yes" or "y" in the "Human?" field and got
some machine-like language telling me to do what I just did. Brainless
machines...
@Sherlock: no shit, Sherlock! You'd be better off if you didn't strip
non-alphanumeric characters before parsing natural language.
Note: In Fedora limit doesn't exist, I used this to get the stack
size:
ulimit -s
@Chris:
That's not a "Fedora vs. the world" issue -
the author took care to write it as "Linux/tcsh".
Thus, just trade your bash (which as you said has the ulimit builtin)
for something usable in the context of this article.
@Yoed: I guess you'd want to avoid reinterpret_cast (portability
issues), and use static_cast(static_cast(rawMem)) instead – which is
well defined.
@Yossi: I liked the debug process you did there, but wouldn't a
simple glimpse at the code given you the answer already?
I am asking because I find people are split to two groups regarding
debugging:
- Those that just go ahead and gdb the hell outta their binary as soon
as a problem is encountered,
- And those (myself included, I must admit), that rather re-read the
code a few times, and then use printf debugging, all just to avoid gdb
:)
As with everything, professionals must balance both of these
approaches.
@rmn: well, in my example you could probably see the problem since it
was really small, but in a larger program, when the program crashes at a
place where you don't really expect it to crash no matter what data it
runs on (that is, no pointer dereferencing, no division, etc.), I don't
see how reading code would help. The large object could be (and in
reality, will be) defined deep down in some member of a member of
a...
Of course if you've already seen a similar bug, you'll be looking for
a large object – but if you haven't, reading source doesn't really point
you to the problem very directly since nothing interesting happens at
the source level. A similar and perhaps nastier case is when you get a
misaligned stack pointer and crash at some place where the compiler used
an x86 vector instruction that only works with aligned addresses.
There's absolutely nothing wrong at the source level so you have to
disassemble to make any sense of what you see. (The difference here is
that it's not the fault of standard debugging tools, really, since
standard compilers will set up the stack properly; this is the fault of
whoever glued shared libraries built with different calling conventions
together or similar – although the tools could do at least
something to detect an ABI mismatch and they don't. Whereas in
my example, it's pretty standard and widely encouraged C++ style –
"local variables are the best" – and then you get no help at all from
the tools to figure out what went wrong.)
There's this one thing I don't understand and it itches.
"RTFA. In C you’d be very likely to malloc the forward-declared
struct"
Why wouldn't one be just as likely to "new" the class in C++?
Well, obviously nothing prevents one from new'ing an object rather
than stack-allocating – it's a question of what the preferred style
is.
In C, "encapsulation" generally means ADTs – forward-declared structs
implemented in a .c file, hiding their internals to the point where you
can't allocate an object yourself since you don't know its size – so you
call an allocation function provided in the .h file for the ADT.
In C++, "encapsulation" generally refers to the "private:" keyword –
a class keeps its implementation details in the "private:" section, and
the compiler can thus allocate an object anywhere where the .h is
included, since it knows the object size, the price being that it also
has to read the .h files describing the classes of the members. The
common C++ style encourages allocation on the stack, both because its
more efficient than allocation on the heap and because it's the most
straightforward way to get RAII-style resource management.
Again, it's not about what's possible but about what's likely; the
commonly advocated and used style affects the likelihood if not the
possibilities.
Instead of this:
char _parseBuf[MAX_BUF_SIZE][MAX_PROF];
you can do the following (and allocate on the heap instead of
stack):
std::vector _parseBufVec(MAX_BUF_SIZE*MAX_PROF);
typedef char _ParseBuf[MAX_BUF_SIZE][MAX_PROF];
_ParseBuf & _parseBuf = *reinterpret_cast(_parseBufVec.data());
reinterpret_cast should make the two views on the same memory act as
if they were
two entries in a union. I believe that was the original intention of
reinterpret_cast.
Ok, there is no preview button, but the reinterpret cast was supposed
to be like this:
reinterpret_cast<_ParseBuf*>(....
the <,> got parsed away (probably because they look like a
tag).
Yeah, that's one way to do it, although I'd probably typedef the 2D
array type to something and then new/delete it myself instead of using
std::vector and reinterpret_cast, just so that there's a teeny bit less
of tasty C++ goodness involved.
Well, if you do all the new/delete on your own, you risk leaking if
anything throws and something in a higher stack frame catches it. You
then need to start wrapping your allocation/deallocation in ctor/dtor;
or to use smart pointers. Before long, you'll be writing quite a bit
more code than a simple (ok, that's definitely a euphemism) cast would
have been.
Much of the code I work on is compiled without exception support,
precisely to not care about this sort of thing.
Why wouldn't you just do something like the following instead?
Application app = new Application(profile);
app->mainLoop();
delete app;
That sounds like a far better fix. And just so you know, you can run
into problems with the stack in pretty much any programming language
that exists. This issue doesn't just exist in C++.
You could do that, as I mentioned. As to other languages – most
allocate less on the stack and report stack overflows more
gracefully.
Hi've got an easy solution for you: just move the app declaration out
of main. That way that variable will be statically allocated by the
linker in the data section, and you'll have no problem with the
stack.
I do like C++, altough not the metatemplate interpreter builtin, and
having spent the last few days reading your FQA, I'm trying not to sound
like a C++ apologist...
Incidentally, 64-bit Linux also allocates 8M for the stack by
default, so this isn't a 32-bit vs 64-bit issue.
Yep, ended sounding dumb instead, profile is a parameter for the
constructor.
My other solution would be like Ray's one, above.
Well, sure, there are plenty of workarounds, once you figure out the
problem – which many, many competent C++ programmers really can't.
For certain values of competent.
I've seen lots of C programs (particularly vendor supplied "examples"
for microcontrollers) that stack overflow spectacularly because they
thought to allocate the entire data buffer on the stack, and have the
declaration buried deep within nested loops and switches.
This is certainly not a C++ problem, more of a brain-dead developer
problem.
Yes, a matrix was to large to statically allocate, what is the limit
on statically allocated variable ?
Well explained. I was getting the same error in a legacy C code and I
never considered it to be a stack overflow.
Post a comment