How profilers lie: the cases of gprof and KCachegrind

We'll see how gprof and KCachegrind lie to us and why they do so, discuss the limits within which we can trust them nonetheless, and attempt to draw more general conclusions about profilers and profile visualization tools.

(In case your programming vocabulary is different from mine – I use "lying" in its dispassionate meaning of "communicating falsehoods" and not to convey negative judgement; on the contrary, I'm indebted to the authors of profiling tools both as a user and a developer of such tools.)

So, consider a program with two parts – an easy part and a hard part. Both parts do similar work but one part does much more work than the other:

void work(int n) {
  volatile int i=0; //don't optimize away
  while(i++ < n);
}
void easy() { work(1000); }
void hard() { work(1000*1000*1000); }
int main() { easy(); hard(); }

Here, work() is a do-nothing loop. easy() executes a thousand iterations of that loop and hard() executes a billion iterations. We therefore expect main() to spend most of its time in hard() and only a tiny fraction in easy().

Now let's profile the program with gprof:

gcc -o try try.c -pg
./try # saves stats to gmon.out
gprof try gmon.out

On my machine, this prints the following info:

self            self    total
seconds  calls  s/call  s/call  name
   3.84      2    1.92    1.92  work
   0.00      1    0.00    1.92  easy
   0.00      1    0.00    1.92  hard

gprof's lie is marked in red; it says easy() and hard() took the same amount of time, instead of hard() taking a million times more to run.

What happened? Can we trust anything that gprof says? Which parts of its output are entirely wrong like this "easy() is the same as hard()" business and which parts are roughly correct, give or take a measurement error? To answer this, we need to briefly discuss how gprof works.

Roughly, gprof's two sources of information are profil() and mcount():

  • profil() – a cousin of creat() in that it could have been spelled with an "e" as well – updates an instruction address histogram every 10 milliseconds. That is, 100 times a second the OS looks which instruction the program is executing, and increments a counter corresponding to that instruction. So the share of increments corresponding to a function's body is proportionate to the share of time the program spent in the function.
  • mcount() is a function called by assembly code generated by gcc -pg. Specifically, when a function is entered, it calls mcount() to record a call to itself from the caller (the caller is generally easy to identify because it necessarily passes a return address to your function and that address points right into the caller's body.) So if f() calls g() 152 times, mcount(f,g) will be called 152 times.

With this in mind, we can roughly tell what gprof knows. Specifically, it knows that:

  • easy() and hard() were both called once; work(), called from each, ran twice. This info is from mcount() and it's 100% reliable.
  • The program spent almost no time in the code of easy() and hard(), and most of its time in the code of work(). This info is from profil() and it's rather reliable – because the program ran for >3 seconds, which means we had >300 increments in our instruction histogram. If almost all of these increments are in work(), that's significant enough.

What about the share of time easy() spent in its call to work(), and the share of time hard() spent in work()? By now we know that gprof knows absolutely nothing about this. So it guesses, by taking 3.84 – seconds spent in work(), a reliable number – and divides it between easy() and hard() equally because each called work() once (based on mcount(), a reliable source) – and we get 1.92. This shows how bad results can be produced from perfectly good measurements, if passed to the wrong algorithm.

More generally, gprof's output falls into the following categories, listed in decreasing order of reliability:

  • Number of calls: 100% reliable in all parts of the report. (I think; please correct me if I'm wrong.)
  • Self seconds in the "Flat profile" (time spent in a given function not including children): reliable to the extent that 100 samples/second is statistically significant given the number of hot spots and the total runtime.
  • Seconds attributed to call graph edges (contribution of children to parents, total runtime spent in self+children, etc.): possibly dead wrong. Only trust it if there's zero code reuse in a given case (that is, f() is only called by g()), or if the function in question is known to take about the same time regardless of the call site (for example, rand()).

BTW, the fact that gprof lies doesn't mean that its documentation does; on the contrary, `man gprof` says, in the BUGS section:

The granularity of the sampling is shown, but remains statistical at best. [this refers to the limited reliability of profil's histogram.] We assume that the time for each execution of a function can be expressed by the total time for the function divided by the number of times the function is called. Thus the time propagated along the call graph arcs to the function's parents is directly proportional to the number of times that  arc is traversed. [this refers to the absolutely unreliable way in which profil's "self time" data is combined with mcount's call graph data.]

Unfortunately, users tend to read tools' output without reading documentation. (The ability of users who aren't into profiling tools to understand the implications of this passage is a separate question.)

The man page also refers to papers from 1982 and 1983. An age of over three decades is a good reason to cut a program some slack. In a way, gprof's age is not only a source of its limitations, such as only 100 samples per second, but also a source of its strengths, such as fast execution of profiled code and wide availability.

Now let's look at a more modern profiler called callgrind – a valgrind plugin. Being more modern, callgrind has a few advantages over gprof – such as not lying in its call graph (though some would debate that as we'll see), and coming with a GUI program to visualize its output called KCachegrind.

KCachegrind the viewer (as opposed to callgrind the measurements collector) does lie in its call tree as opposed to call graph as we'll shortly observe. But first let's have a look at its truthful reporting of the situation with easy() being easier than hard():

KCachegrind's call graph - true

As you can see, easy() isn't even shown at the graph (KCachegrind hides things with insignificant cost); you can, however, see the cost of main's call to easy() and hard() at the source view – indeed easy() is ~1000x faster than hard().

Why 1000x and not 1000000x? Because I changed hard() to run a million iterations instead of a billion, bringing the difference down to 1000x. Why did I do that? Because callgrind is slow – it's based on Valgrind which is essentially a processor simulator. This means that you don't measure time - you measure things like instructions fetched and cache misses (which are interesting in their own right), and you get an estimation of the time the program should take given these numbers and your processor model.

It also means callgrind is slow. Is it slower than gprof? Not necessarily. With gprof, code runs at near-native speed, but you only get 100 data points per second. With callgrind you get much more data points per second. So for a hairy program, with callgrind you get statistically significant data more quickly – so effectively callgrind is faster.

But for a simple program with just a couple of hot spots, callgrind is slower because if the program has a costly part 1 and then a costly part 2, it'll take callgrind more time to even get to part 2, whereas gprof, with its near-native speed, will give you good enough data from its fast run.

So much about speed; now let's look at a case where KCachegrind lies to us, and then we'll discuss why it happens.

To expose the lie, we'll need a more complicated program. We'll achieve the required complexity by having two worker functions instead of one, and then adding a manager – a function that does nothing except calling the two workers with the number of iterations to run.

How does the manager decide the number of iterations each worker should run? Based on the project requirements, of course. Our "projects" will be two more functions, each calling the manager with its own pair of iteration numbers for the two workers.

void worker1(int n) {
  volatile int i=0;
  while(i++<n);
}
void worker2(int n) {
  volatile int i=0;
  while(i++<n);
}
void manager(int n1, int n2) {
  worker1(n1);
  worker2(n2);
}
void project1() {
  manager(1000, 1000000);
}
void project2() {
  manager(1000000, 1000);
}
int main() {
  project1();
  project2();
}
As you can see, both workers work on both projects, but each project is mostly done by one of the workers, the other contributing 1000x less work. Now let's see what KCachegrind says about this; we need to run callgrind, which can be done without special compilation flags:
gcc -o try2 try2.c
valgrind --tool=callgrind ./try2
kcachegrind `ls -tr callgrind.out.* | tail -1`
Here's what we'll see:
KCachegrind call tree - false

The bottom part of the screen shows us truths, but the top part shows falsehoods.

The truth is that each project called the manager once; the manager called each worker twice; and each worker did half the work in the program – all shown at the call graph at the bottom. However, at the top, each of the project functions occupies half the window and shows that worker1 and worker2 each did half of the work in each project - which couldn't be further from the truth.

This falsehood is reported by the call tree (or "callee map" as KCachegrind calls it) – a view which is supposed to show, for each function, the share of work done in each of its callees relative to the work done by all those callees together (as opposed to the call graph which only links to the called functions and tells how many times they were called by that caller – and their share of work in the entire program.)

Why does the call tree tell a falsehood? Again, because it doesn't know the truth. KCachegrind visualizes callgrind's measurements, which include the number of times f() called g() and the time spent in calls from f() to g().

This is more that gprof's information – way more. gprof only knows how much time was spent in f() and g(), and how many times f() called g(). Callgrind also measures how much time was spent in g() specifically when it was called from f(). In particular, this means that KCachegrind's source view gives a perfectly reliable measurement of the time spent in f plus all its callees – something which users take for granted and something which gprof only guesses, often wildly wrongly.

However, this information is not enough to know what the call tree needs knowing to show the truth. Specifically, you only know that manager() spent the same time in calls to worker1() and worker2() overall; you have no idea how much time it spent in each worker when called from project1() and project2(). So you can't reliably plot the share of time worker1() and worker2() spent inside project1() or project2(); you can only guess, often wildly wrongly.

(In fact, you can't tell if manager() even called worker1() when called from project1(); perhaps it didn't – all you know is that manager called worker1() in some context. Some people conclude that the call graph is "incorrect"; in fact it is correct, the question is if you understand what you see the way you're supposed to – you aren't supposed to think that every path through the graph actually happened, only every edge. Another question is how upset you are when you find out (someone with a lot of "manager" functions doing nothing but dispatching might be very upset.) This example certainly broadens the gray area between "truth" and "lies" in profilers' output.)

Callgrind has something which appears like a possible workaround: --separate-callers=N. What this does is it measures things per call stack of size N instead of per call "arc". That is, instead of having a measurement for the arc from manager() to worker1() and a measurement for manager()->worker2(), it measures separately for project1()->manager()->worker1(), project1()->manager()->worker2(), project2()->manager()->worker1(), etc.

Unfortunately, KCachegrind didn't manage to open the resulting output file on my machine, nor did it help when I replaced the ticks (') separating the function names (which get concatenated together) with underscores (_). However, the textual data produced by callgrind indeed shows the truth:

fn=(726) worker2'manager'project1
5 3
+1 1
+1 7000008
+1 2

fn=(736) worker2'manager'project2
5 3
+1 1
+1 7008
+1 2

This shows that worker2() did 1000x more work when called (through manager()) from project1() than it did when called from project2() – as we know it did.

Having looked into the details of two particular examples, we'll proceed to a more general discussion of profilers and profile visualization tools.

Using no profiler at all is better than hoping it'll save the day

I know a few people who like to optimize code and think optimization is important, and who mostly ignore profilers. I know a few other people who claim that a good profiler is the necessary and sufficient prerequisite for optimization. More often than not, such people are not particularly fond of optimization, and their code will be slower than the code of above-mentioned profiler-bashers, before as well as after optimization.

The examples above supposedly show a part of the reason why "a good profiler" is not at all trivial to use.

Basically among the many opinions, there are two extreme ones sounding along the lines of:

  • You don't know where your bottlenecks are going to be, therefore don't bother to optimize before measuring.
  • You don't know where your bottlenecks are going to be – nor will you be able to measure them because adequate measurement and analysis tools plus test scenarios covering the relevant possibilities are hard to come by. Therefore, conserve resources if there's even a slight chance that the code will be a bottleneck.

In my experience, the first viewpoint results in slower code and is less consistent internally. That is, for someone who's not into optimization, a profiler is just not the force multiplier that he thinks it is. That's because he won't put the mental effort required to make an effective use of the profiler for the same reasons making him write slow code in the first place: he just doesn't like all this performance stuff.

The trouble with this being that profilers are not things magically telling you what to do without concentration on your behalf.

You need to understand how a profiler works to use it well

When I first realized how gprof lies in its call graph, I was so offended that I stopped using it for a long while. The problem with profilers is that not all the data is gross lies, but some is, and without knowing which data is likely to be wrong, you might trust things that you shouldn't, scratch your head to the point of hair loss, and then abandon the tool altogether once you realize how it routinely betrays your trust.

With KCachegrind, I came up with the example where it lies based on my knowledge of the callgrind output format – something I know because we (specifically, GK) added support for that format to our in-house profiling tools at work. I wouldn't guess that the Callee Map view is unreliable otherwise. Like most users, I rarely read the docs unless something clearly refuses to work altogether. The stats in the call graph as well as the source view are perfectly reliable. How would I know some other stats aren't?

The extent to which you warn the user about possible implications of assumptions in your software is a tough question for all programmers. Should KCachegrind have a big red warning "I might be misleading you" at its "map" views? A "true power user" doesn't need the warning because they know how the tool works anyway. A "truly careless user" wouldn't read the explanation linked to from the red warning and would just be scared away. Only a "middling user" irresponsible enough to not read the docs but responsible enough to read text linked to from big red warnings might benefit from such design. But how many such users are there compared to the rest?

My own experience here is depressing – nobody, not even the smartest folks, is willing to read anything unless they came here to read. As in, when you're a tutorial that people intend to read, you can tell things to people and they listen. When you're an error message, people read you to the extent necessary to make you go away. And when you're a warning, people ignore you. It sucks to be a warning.

So I'm pessimistic about big read warnings. As to tutorials – people don't expect profilers to be complicated enough to warrant a tutorial, so they probably won't allocate time specifically to read one. They're wrong, but they won't.

Choosing a profiler is hard

There's no perfect profiler – it's a tradeoff, or rather a bunch of tradeoffs. Let's see how complicated those tradeoffs are by comparing gprof, callgrind and Google's CPU profiler:

  • gprof is fast, requires a special compilation, gives you "self time" based on 100 instruction address samples per second, precise call counts, and often bogus "children time" information.
  • callgrind is slow, requires no special compilation, gives time estimations based on event counting, several order of magnitudes more data points per second (making it "effectively faster" for some use cases but not others), precise call counts as well as precise "events counted during a call to each child" information, and comes with a viewer giving correct though possibly misleading call graph and often bogus "map" views.
  • Google's CPU profiler is fast, requires no special compilation unless you're on a 64b platform without a working unwind library, uses a configurable amount of samples per second (default: the measly 100, I wonder why), lacks precise call count information, logs full call stacks unlike gprof and callgrind without –separate-callers, but then converts data to many viewing formats where the extra info is lost (such as the callgrind format). You do get more informative view of the profiling data in some cases.

So basically each of these profilers is better than the rest in some way and worse in some other way (for instance, gprof, at first glance the awful, ancient profiler, is actually your shortest path to precise call counts, which may well be the thing you need the most). And that's before discussing things like the ease of getting a working version for your platform.

As it often is with complicated things, the choice is made harder by the fact that you don't realize many of the implications until after you gained some experience with the tool. I don't think I know what questions to ask about a new profiler, even though I'm relatively savvy. I do realize that I want to understand how it works in a lot of detail.

Not all errors are "noise"

If you're listening to an analogue recording and there's a "shhhh" sound, it's "noise". If, however, someone is yelling near you, then this louder noise isn't "noise" in the mathematical sense – it's another signal. And if someone has overwritten your original recording, then there's only another signal left and none of yours. There's also noise on top of that other signal, but that noise is the least of your troubles.

Not everything standing between you and your signal is "noise"; sometimes it's an error making you look at the wrong signal.

With profilers, people intuitively expect "measurement noise" – a profiler measures time (or some other cost) and no measurement device is perfect. People are willing to accept that – but they do expect fidelity in the assignment of costs to context. "Context" is source lines, functions, and call sequences; people (correctly) think of context as something logical where the concept of "measurement errors" doesn't apply; it's either correct or not.

Unfortunately, people are right, but the conclusion is the opposite of the natural expectation: "context" is indeed a logical concept and not a continuous quantity – and therefore, when the tools give you wrong information, it's really wrong, as in completely detached from reality, and not something "roughly right" give or take a measurement error.

(Our examples focusing on call sequences are rather benign, in a way; for real fun, we could discuss the fidelity of assigning costs to source code lines in optimized programs. Even with a nice assembly listing linking to the various source files from which code was inlined, like the listing provided by KCachegrind, it's… let's say less than easy.)

The upshot is that many errors in profilers' output are not something that can be fixed by running the program more times to get more statistically significant results. Rather, you have to know where the results are only distorted by noise and where there can also be logical errors rendering the numbers meaningless.

Presentation errors can exist without measurement errors

…As evidenced by KCachegrind's false presentation of callgrind's or Google profiler's correct results, or by gprof's false conclusions based on reasonable numbers reported by profil() and perfect numbers from mcount().

To add to the previous point, measurement errors are more likely to be "noise" than presentation errors, which are more likely to tell something unrelated to the truth.

Conclusion

Profiling is trickier than we tend to assume, and as someone developing profilers, I understand programmers who're good at optimization and who mostly ignore profilers. A profiler could help users get way more mileage if it found a way to convince them to actually read a thorough, memorable explanation of its strengths and weaknesses.

150 comments ↓

#1 Jerry on 02.02.13 at 3:25 pm

Thanks for your post. I find it very timely.

Let me ask you a newbie, but somewhat related question.

I rarely use profilers. Earlier this week, I wrote a program twice. Once using memory mapped IO. Once using streaming IO from the C standard runtime library.

The program itself was pretty simple. It scanned about 10,000 lines of about 150 chars, parsed some data from each line, built a tree out of that, searched the tree, reported on its findings.

To my surprise the memory mapped IO version was about 10 times slower that the streaming io version as in the memory mapped io version took about 5 seconds to complete versus the almost instantaneous response from the streaming io version.

So I used gprof to inspect the differences.

First thing I noted, and if I recall correctly, was gprof was of little value in determining where the problem was, because the hotspot was in the main program, not in any function.

Eventually, because I suspected what might be problematic and put it into a function that was called, I traced the hotspot to:

> while((point < fend) && (*point++ != 'n'));

Now what this line should be implementing is a while loop that scans from the current point in a file to just past the next newline in the file, all the way making sure than point doesn't overrun the end of the file (fend = filebeginning + filesize).

This was a very slow statement to execute.

Now first, this seemed difficult to pinpoint using gprof alone until I placed that line into a function skip, at which point gprof verified that skip was the hotspot.

Second, reading on the note provides me some insight that the reason this statement is slow (assuming I measured it correctly) is due to the kernel having to have a context switch to move each character into userspace when it is conceivable the c standard runtime version with its buffered io needs a context switch only for buffer fills/refills.

So my next question is, what profilers or other tools are out there that would help me understand

1) which statement in a large sequence of statements is slow
2) and if such a statement is slow due to poor interactions with the kernel.

Thanks!

#2 Eldc on 02.02.13 at 5:52 pm

What I take away from this interesting post is this: part of the output of gprof and KCachegrind is not reliable because they try to display data that they did not collect.
But what of the profilers that log the full call stack? They have all the data they need, so they shouldn't be misleading–even to someone who isn't an expert, right?
Unless I am mistaken, the only thing one would need to know before trusting a profiler's output, is whether it logs the full call stack.

#3 ale on 02.02.13 at 7:03 pm

From someone who has used gprof many times without having a clue of how it works, thank you! I have probably been bitten by this without even knowing.

#4 Yossi Kreinin on 02.02.13 at 10:34 pm

@Jerry: I think gprof has an option to list line-level profilers, or at least gcc -pg has a flag to instrument every source line (I'm more sure of the latter than the former and then the question is what's your viewer going to be), but I'm not sure; I don't really use gprof all that often… Callgrind can do a line-level profile, for sure, but as to kernel interactions – I wouldn't know, since most of my development targets an OS-free in-house chip…

@Eldc: there are generally plenty of sources for confusion apart from the ones discussed in the article, but even for these, it's not "100% enough" to know that the full call stack is logged – the question also is what's displayed to you and what it means; as in, Google's CPU profiler that logs the full call stack loses that info when outputting in callgrind format, and KCachegrind showing you the call graph does not mislead you if and only if you understand that not every path actually happened at runtime (I think Google's profiler also displays you such "bogus" paths in some of its views but I'm not sure.)

#5 Yossi Kreinin on 02.02.13 at 11:08 pm

@ale: it's not that bad if gprof ultimately points you to a function that can and should be optimized. What really got my goat when I first realized how gprof works was that I had, roughly, a program with a hard part which only ran in offline debugging scenarios, and an easy part which ran in real online runs, and both used the same function, but the easy part used it on much smaller inputs. So what ends up happening is that gprof thinks the easy part takes much more time than it does because it doesn't have callee cost measurement like callgrind, but instead guesses based just on the number of calls. So when I found out that random debugging code affects the estimation of the runtime of the code you're really trying to prodile and gprof points you to a function that, if optimized, would speed up your debugging code more than your "actual" code, I was really disappointed…

#6 Oleg on 02.03.13 at 1:14 am

Yossi, but how did you end up even using gprof in the first place? Does your target platform not support oprofile or perf?

#7 Jerry on 02.03.13 at 1:56 am

fwiw, on my system it's –tool not -tool

#8 Yossi Kreinin on 02.03.13 at 2:06 am

@Oleg: erm… because I haven't known about these? which I don't know if I regret :) – because sudo apt-get install oprofile caused a log-out (I'm not awfully happy with that log-out… Linux is impossible to crash, my ass), after which I failed to launch "operf" which is "not found" – I dunno, perhaps I'll try it on a newer Ubuntu at home… and as to perf, I haven't figured out how to use it, either. So if you can spare some info I'd be grateful; particularly with respect to inner workings/limitations (I imagine they're better than gprof in some ways, but I still wonder about their inner workings for reasons described above…)

@Jerry: it's double-minus-tool everywhere – WordPress screws up double minus characters, apparently… Sigh.

#9 Billg on 02.03.13 at 4:39 am

For those of us on Windows, the Intel profiler does the right thing (group by function+callstack)
http://software.intel.com/en-us/intel-vtune-amplifier-xe
Does anybody know an open-source profiler for windows that does this too?

Meta comment regarding gprof vs oprofile: it seems to be recurring problem with linux: it serves both as a modern operating system, and as a museum of historical artifacts. Some tools you're only supposed to watch, not touch. Not all tools are labeled :)

#10 Romain on 02.03.13 at 7:40 am

@Billg, do you know CodeXL from AMD (it's not open source but it's a free tool) and it can do CPUProfiling (+kind of GPU profiling) :
http://developer.amd.com/tools/heterogeneous-computing/codexl/

#11 Michael Moser on 02.04.13 at 1:05 am

This profiler requires you to recompile the code;

http://www.logix.cz/michal/devel/CygProfiler/

the required gcc option -finstrument-functions add a call in function prologue and a call in function epilogue; now the tool hooks this function to do its profiling.

Of course the profiler has its own overhead that skews the picture …

#12 Marius Gedminas on 02.04.13 at 2:45 am

@Jerry, when you use mmap, you get at most a single page fault per page (4KB), but IIRC the kernel reads ahead and faults in a number of pages (something on the order of 128K?).

You could inspect the number of minor and major page faults with /usr/bin/time.

My guess would be that the 5x speed difference is caused by the read-based code reusing the same working set of memory pages, while mmap-based code uses new pages all the time, flushing old ones out of the CPU cache.

#13 Yossi Kreinin on 02.04.13 at 4:04 am

@Michael: the trouble with -finstrument-functions the way I understood it is that it instruments even those functions which were inlined; this means a ton of overhead for C++-y code with tiny accessors calling tiny accessors.

#14 Michael Moser on 02.04.13 at 6:05 am

On the other hand you have the flexibility to instrument only those source files that are performance critical; Gprof can give you an idea where 80 percent of the time is spent; which is supposedly a minority of the code; then you can selectively instrument the modules that are critical.

#15 Michael Moser on 02.04.13 at 6:06 am

I agree that lack of inline is a big skew that alters the picture in big ways. Yes.

#16 Bruce Dawson on 02.04.13 at 9:04 am

For those of us on Windows I would recommend Microsoft's xperf, also known as Windows Performance Analyzer. It includes a sampling profiler that goes up to 8 KHz (default is 1 KHz) and the collation of samples according to call stacks (and optionally thread ID) is correct. It also allows separate analysis of why a thread is idle. It is extremely powerful and doesn't lie. However it is complex. I've blogged about how to use it at http://randomascii.wordpress.com/category/xperf/

It is unfortunate that the profiler defects which you point out on Linux have not been fixed. It sounds like many of them are eminently fixable.

#17 Jason T on 02.04.13 at 10:36 am

No mention of perf? @Jerry perf is the profiling tool you want. It will show you why your mmap solution is slower. perf is a whole system profiler, so you'll see what the kernel is doing when your program runs.

#18 Nimrod on 02.10.13 at 1:28 am

What about using Linux 'perf' command?

Supposedly, it does sample the call stack along with the current address. Unfortunately the ancient RHEL5 I'm forced to use at work does not support the perf command.

#19 Mike Dunlavey on 04.22.13 at 9:42 am

Jerry,
As someone gripped by this, I'm a pest on this subject. My only defense I can demonstrate what I'm talking about. Look here: http://stackoverflow.com/a/1779343/23771
When the objective is speed, in non-toy programs, of course you're right that guessing doesn't go anywhere. The trouble is, where profilers are concerned, where's the beef? I've never seen anyone using a profiler claim more than maybe 40% speedup. That's peanuts. The thing is, there are opportunities for speedup in the code, but if the profiler doesn't expose them, those programmers stop looking. Where are the results? This subject is ripe for a bake-off.

#20 RickBee on 04.23.13 at 1:19 pm

I tried your sample programs with quantify (part of the purifyplus suite, from ibm/rational). (I've been using quantify instead of gprof for well over 10 years).

Quantify doesn't lie in any of your test cases – then again, it doesn't show the kinds of pretty pictures that the other tools do, and it isn't free. Like callgrind, programs run (much) slower when quantified, but even in non-toy programs they're fast enough to get the answers you need.

Sure it's not free but I think it's worth it

#21 Yossi Kreinin on 04.23.13 at 9:56 pm

Indeed it might be better than the example profilers I used; I'd need to have some experience to know what it does and how trustworthy it is.

Not being free has other issues apart from having to pay and ration licenses. In my case, I want to feed profiling data obtained using a hardware probe into a visualization tool; non-free tools often don't document their data formats. Can still be worth it if you don't roll your own platform like we do, or if you pay money for the porting of the tools (which I found to be a PITA, again not just because of the money, but because of the more complicated process.)

#22 Largo Lasskhyvf on 06.18.13 at 12:26 am

Are you aware of PTLsim at http://www.ptlsim.org/ ?

#23 Yossi Kreinin on 06.18.13 at 10:14 pm

I haven't been aware of it; if I ever need to optimize on x86, I'll definitely look into it.

#24 Mike Dunlavey on 04.13.14 at 9:57 am

You've put a lot of good effort into this, so thanks for that, but there's something that totally mystifies me, and if you don't know the answer that's OK:

Why is it not generally known that a good way to find performance problems is to simply pause the program at random, a few times, and examine the call stack?
As Agner Fog says in http://www.agner.org/optimize/optimizing_cpp.pdf : "A simple alternative is to run the program in a debugger and press break while the program is running. If there is a hot spot that uses 90% of the CPU time then there is a 90% chance that the break will occur in this hot spot. Repeating the break a few times may be enough to identify a hot spot. Use the call stack in the debugger to identify the circumstances around the hot spot."

I talk about this on stackoverflow quite a bit. It is very easy to assume you get what you pay for – it's so simple it must be very poor compared to a nice modern spiffy profiler. In my experience, exactly the opposite is true. You've done a nice job of pointing out how some profilers can be very misleading. The bigger problem is that, even if they are really accurate, there are problems, big ones, that they simply miss altogether. What's more, the accuracy of measurement that they strive for is actually of little value in pinpointing those problems.

I made a 8-minute video demonstrating this: https://www.youtube.com/watch?v=xPg3sRpdW1U

Here's a stackexchange answer explaining the statistical reasons why it is effective: http://scicomp.stackexchange.com/a/2719/1262

#25 Yossi Kreinin on 04.13.14 at 10:09 am

Well, in a well-optimized production program the heaviest single function takes 5% and it falls off from there. You still need to optimize further. What do you do?

Also – trace-based or simulation-based profilers are extremely accurate in terms of measurements and they give you source-line-level and assembly-level stats. And they count things like cache misses and so on so you know why things are slow.

Your method is fine and I used it myself but a replacement for a profiler it is not.

#26 Mike Dunlavey on 04.13.14 at 4:58 pm

Well, maybe you're right that it's not a replacement for a profiler, but if a profiler were a replacement for it, you could take the program here: http://sourceforge.net/projects/randompausedemo/
and speed it up by over 700x (without looking to see how I did it, obviously). The best I've seen anybody do so far, without random pausing, is about 70x.
If you have a profiler that can show you what things to fix, let me know, because that's news.

#27 Yossi Kreinin on 04.13.14 at 7:33 pm

Different people work on different things. Profilers have their place, and there are situations where random pausing is hopelessly inadequate. Saying that profilers giving enough insight to "get things fixed" are news is kinda extreme.

#28 Bruce Dawson on 06.11.14 at 8:51 am

I've been using perf on Linux, configured to record 1,000 samples per second. Like ETW/xperf on Windows it grabs a call stack with each sample. These call stacks allow the samples to be properly overlaid in order to avoid lying. I use Brendan Gregg's flamegraph script to graph the data, which I have also used on Windows:

http://randomascii.wordpress.com/2013/03/26/summarizing-xperf-cpu-usage-with-flame-graphs/

So there is hope. My one complaint is that I'd like better ways to browse the perf data — any suggestions one other tools for browsing the sampled call stacks from Perf?

#29 Mike Dunlavey on 11.01.14 at 6:29 pm

What I'm interested in is showing people how to get maximum performance, not how to be proficient in tool use.
Call it extreme if you like, but the proof's in the pudding. There are big speedups that random pausing can see that even the best profiler cannot.

This shows the scientific reasons why it works:
http://scicomp.stackexchange.com/a/2719/1262

This shows how problems can hide from profilers:
http://stackoverflow.com/a/25870103/23771

IMHO, a good profiler is one that gets stack samples, including precise call sites (so it gets line-level cost), on wall-clock time (so it can detect needless blocking), and reports line-level percent stack occupancy. (Recursion is a non-problem.) Zoom does this, there may be others.
But nothing compares in efficacy to human examination of the samples, because no profiler's summarizing back-end is as smart as a human, and the objection that a human cannot examine enough samples is a red herring, because any speedup big enough to be worth fixing will appear in 10-20 samples.

Here's the example of getting a speedup of almost three orders of magnitude by finding six successive speedups:
http://scicomp.stackexchange.com/a/1870
but skipping any one those speedups would have resulted in far less.

#30 Yossi Kreinin on 11.01.14 at 6:44 pm

When you have 20 people each optimizing their own part of a huge program, and with each person responsible for 5% of the runtime of which they try to cut 40% (so 2% of the current total runtime) over the course of several weeks (so each improvement might be fairly small), how would you use random pausing?

From another angle – how do you define "a speedup big enough to be worth fixing"?

#31 Mike Dunlavey on 11.01.14 at 7:09 pm

Check the links – it's all there.

#32 Mike Dunlavey on 11.03.14 at 4:43 pm

Were the links helpful? To respond to your question, it presumes that performance issues divide themselves up by different peoples' modules. Rather, the division should be by execution phases as perceived by the user, like "this particular page takes a long time to appear". Stack samples during that time will show why, and the fix is likely to require changes to more than one module. The real difficulty is – how do you get disparate people to make those necessary changes? If they refuse, or merely temporize, what do you do? Here's one of my experiences with this:
http://programmers.stackexchange.com/a/101869/2429

#33 Chris E on 12.09.14 at 12:30 am

I realize it's a long time since this post was made but I suspect if you stepped through your code in gdb or even checked an assembly output you might find no lying as you are accusing this of.

the problem is this code:
while(i++ < n);
you might have marked i as volatile but it's a stack variable local to the function and thus can be safely tossed into a register.

the x86 code to implement this is essentially

get value from stack
compare value to argument
add 1 immediate to register
store value back on stack
compare initially retrieved value to limit
test equality
conditionally jump to start.

now since these are some of the most basic instructions and the data will stay in the processor cache we can assume each executes in 1 processor cycle. the loop is 7 instructions long(acctually 8 with a nop from objdump -d)

we can round up to 10.

now my processor is 1.6 GHz. that means 1,600,000,000 instructions per second.

gprof claims to have granualrity to .01 second.
so 1,600,000,000 * .01 gives us 16,000,000 instructions per recordable time. our loop is 10 instructions we claim, so we can have up to 1,600,000 loop iterations per recognizable time. which means to get useful information from gprof you should consider for my slow processor having easy execute work a minimum of 1.6M times and then hard being a multiple higher than that.

adjust the numbers for your processor speed and give it a try.

If you are interested I'll break down the problemw with the callgrind example for you with the overly simple workload too.

#34 Yossi Kreinin on 12.09.14 at 8:54 am

Well, did you give it a try? I say, run it for a year and it'll still split the work evenly between easy() and hard(). Because the problem has nothing to do with measurement granularity, and everything to do with how gprof analyzes its measurements when it produces the report. It simply doesn't take call stacks into account and instead, having measured the time spent in each function, assigns an equal share of that time to each caller. Meaning that if you'd inline work(), you'd see the difference between easy() and hard(), otherwise you won't.

#35 Yossi Kreinin on 12.09.14 at 8:56 am

And here's where gprof docs explain it (quoted in the article):

"We assume that the time for each execution of a function can be expressed by the total time for the function divided by the number of times the function is called."

Since work() was called twice, once from easy() and once from hard(), gprof assumes we spent half the total time in each call.

#36 Chris E on 12.09.14 at 3:26 pm

yeah I totally see what I did now. I love problems but tend to get too focused on them. I totally went down the rabbit hole till I lost track of how I got there. In short o try and make up for the lack of basic block profiling in current versions of gcc I added another function in the while and that got me more detailed counts for callgrind and gprof.

#37 lol on 12.20.14 at 10:58 pm

If you want to analysis deeply, in real count of the operation, then you should also check the time in the architecture. Even if you give a restriction on the compiler, you couldn't restrict the behavior of the architecture. Learn about vector processing in your architecture before you use a loop. How can you just assume that your architecture will run your code sequentially without reordering and optimizing in a loop?

#38 Yossi Kreinin on 12.21.14 at 8:33 am

Well, strictly speaking, since this is a normal memory variable, and I used no barrier instructions, a processor might in fact legitimately optimize the loop away; however, no commercially significant CPU out there actually does this.

Also… "vector processing" in a loop doing nothing but modifying a volatile variable? "lol" indeed.

#39 JosefW on 06.16.15 at 5:26 pm

Hi, kcachegrind/callgrind author here.

I just found this article now, and it makes a good point about the issues with KCachegrinds heuristical visualizations. Let me explain a bit.

The question how many data to collect and how to visualize something in a meaningful way actually has a lot of tradeoffs, as was mentioned. Typically, added overhead is the argument: as the profiler runs on the same hardware it is expected to measure, any error can be as high as the overhead (error compensation techniques have their limits). Even without gprof's heuristic which often can go very wrong, the added code just for the call counter increment can easily result in more than 100% overhead with short functions, and thus in useless data.
Sampling instead is quite nice, as it allows tuning the overhead. Still, one has to remember that data is statistical and full unwinding at sample points can be a source for significant overhead.

Callgrind actually does not have this problem with overhead or statistical measurement, and could provide exact data with exact call chain contexts. Unfortunately, with complex code, the number of call chain contexts easily explodes, and you have to limit it, unless you want callgrind to run out-of-memory.

But there is another reason that the default limit for call chain lengths in callgrind is only 1: I never came around to implement the needed additions in KCachegrind. Yet, at that time I was interested in more complete call graph visualizations, playing with meaningful heuristics. While I am happy about the widespread use of KCachegrind, it is unfortunate that there still are these issues, which really should be fixed. I like to make a 1.0 release at some point :-)

A bit about the heuristics: with call chain length 1, you actually only have full data of direct callers/callees of a function. Presenting this information usually is called a "butterfly", and that is what "callgrind_annotate –tree=both …" is showing. KCachegrind is using this butterfly information to extend both call trees (tree map) and call graph; BTW, even the latter can go wrong.

Also from my experience, users do not read documentation. A solution for now would be to reduce all call graph visualizations in KCachegrind to butterflys. But actually there is value in heuristics, as even with limited information, one can come up with insightful presentations. Even if I have call chain contexts with length 5, I want to show call chains in the call graph with larger lengths.

What do you think about adding a "Be honest!" knob in the GUI which restricts visualization to the actual available information without heuristics, and this being the default?

#40 Yossi Kreinin on 06.16.15 at 6:10 pm

Hi there :-)

First of all I think I lied in the above, actually… I recall that I realized that I got some of it backwards but I never got to fixing the text… I hope to fix it one day…

Regarding a "be honest" knob – I think something along those lines could be nice… I wrote internal docs about KCachegrind which show the non-heuristical parts in green and the rest in red… Needless to say, few people read said docs.

One thing that I did in a profiler was collect full, precise call trees instead of graphs (so you know the call stack of every call and you render a tree) and actually it did not explode, the only trouble was that I couldn't handle recursion which didn't bother me but could be a problem in a general-purpose tool.

Thanks a lot for KCachegrind, BTW! We retried our internal text/HTML profiler front-ends in favor of KCachegrind, for many reasons including the ability to display many different costs, render graphs dynamically, search things nicely, handle inlining gracefully, etc.

Two things that'd be really nice to have are a source view where you can copy text and the file path and/or open the file in an editor easily, and self-contained profiling info files, with the source code quoted and all. (People very often change or delete the source files and then the profiling data becomes useless; often a profiling takes time to obtain and one would really like to look at the data despite the source files at the original location no longer being there. Or you don't know that they changed the files and you wonder why things look strange.) (I guess the honorable thing would be to submit a patch instead of writing the above :-) Anyway, thanks again for the great tool…)

#41 JosefW on 06.16.15 at 8:13 pm

Ok, my comment about call context explosion was perhaps a bit exaggerated, but this depends very much on the application. Profiling a browser start/rendering of an HTML page will definitly give a quite large calling context tree, and I expect a tool to handle such cases without problem.

Thanks for the suggestions!

I already thought about extending the format to allow self-contained profiles, but more for machine code for dynamically JIT-generated code. But for source, it definitely also makes sense, and should be quite easy.

Another option would be KCachegrind calling external, custom shell scripts, e.g. to get at source. Such a custom script could fetch from a repository or a remote machine.

And of course I accept patches!

#42 Pauletta Bancks on 05.14.19 at 4:21 pm

5/14/2019 @ 09:21:48 In my opinion, yosefk.com does a good job of handling subjects of this kind! Even if sometimes intentionally controversial, the posts are in the main thoughtful and stimulating.

#43 paladins aimbot on 05.16.19 at 12:06 am

Respect to website author , some wonderful entropy.

#44 vn hax on 05.16.19 at 2:00 pm

stays on topic and states valid points. Thank you.

#45 fortnite aimbot download on 05.16.19 at 5:54 pm

Cheers, great stuff, I like.

#46 nonsense diamond download on 05.17.19 at 8:11 am

Enjoyed reading through this, very good stuff, thankyou .

#47 fallout 76 cheats on 05.17.19 at 11:35 am

I really enjoy examining on this page , it has got great stuff .

#48 red dead redemption 2 digital key resale on 05.17.19 at 4:41 pm

I like this website its a master peace ! Glad I found this on google .

#49 redline v3.0 on 05.17.19 at 7:48 pm

bing took me here. Thanks!

#50 badoo superpowers free on 05.18.19 at 9:14 am

Hey, happy that i found on this in bing. Thanks!

#51 sniper fury cheats windows 10 on 05.18.19 at 4:05 pm

I like this site because so much useful stuff on here : D.

#52 mining simulator 2019 on 05.19.19 at 8:09 am

I’m impressed, I have to admit. Genuinely rarely should i encounter a weblog that’s both educative and entertaining, and let me tell you, you may have hit the nail about the head. Your idea is outstanding; the problem is an element that insufficient persons are speaking intelligently about. I am delighted we came across this during my look for something with this.

#53 smutstone on 05.20.19 at 12:49 pm

Respect to website author , some wonderful entropy.

#54 redline v3.0 on 05.21.19 at 8:20 am

I have interest in this, thanks.

#55 free fire hack version unlimited diamond on 05.21.19 at 5:43 pm

I truly enjoy looking through on this web site , it holds superb content .

#56 nonsense diamond on 05.22.19 at 7:33 pm

Some truly cool goodies on this web site , appreciate it for contribution.

#57 krunker hacks on 05.23.19 at 7:53 am

I simply must tell you that you have an excellent and unique site that I must say enjoyed reading.

#58 bitcoin adder v.1.3.00 free download on 05.23.19 at 11:32 am

Me like, will read more. Cheers!

#59 vn hax on 05.23.19 at 8:19 pm

Ha, here from bing, this is what i was looking for.

#60 eternity.cc v9 on 05.24.19 at 9:03 am

I am not rattling great with English but I get hold this really easygoing to read .

#61 ispoofer pogo activate seriale on 05.24.19 at 7:27 pm

Enjoyed examining this, very good stuff, thanks .

#62 cheats for hempire game on 05.26.19 at 7:26 am

I like this site because so much useful stuff on here : D.

#63 iobit uninstaller 7.5 key on 05.26.19 at 10:09 am

I conceive this web site holds some real superb information for everyone : D.

#64 smart defrag 6.2 serial key on 05.26.19 at 4:48 pm

stays on topic and states valid points. Thank you.

#65 resetter epson l1110 on 05.26.19 at 7:37 pm

Intresting, will come back here once in a while.

#66 sims 4 seasons code free on 05.27.19 at 8:53 am

I consider something really special in this site.

#67 rust hacks on 05.27.19 at 9:15 pm

I conceive this web site holds some real superb information for everyone : D.

#68 strucid hacks on 05.28.19 at 11:34 am

I really enjoy examining on this internet site , it has got interesting content .

#69 expressvpn key on 05.28.19 at 8:35 pm

bing took me here. Cheers!

#70 gamefly free trial on 05.29.19 at 2:53 am

Hi there, its pleasant piece of writing concerning
media print, we all be familiar with media is a wonderful source of facts.

#71 how to get help in windows 10 on 05.29.19 at 5:34 am

Your style is unique compared to other folks I've read stuff from.
Thank you for posting when you have the opportunity, Guess I'll just bookmark this site.

#72 ispoofer activation key on 05.29.19 at 9:57 am

I consider something really special in this site.

#73 aimbot free download fortnite on 05.29.19 at 1:58 pm

Some truly cool stuff on this web site , appreciate it for contribution.

#74 redline v3.0 on 05.29.19 at 6:23 pm

I like this website its a master peace ! Glad I found this on google .

#75 gamefly free trial on 05.30.19 at 12:42 am

Do you mind if I quote a couple of your articles as long as I provide credit and sources
back to your site? My blog is in the exact same area of interest as yours and my users would really benefit from
a lot of the information you present here. Please let me know if this ok with you.
Appreciate it!

#76 vn hax on 05.30.19 at 7:43 am

Enjoyed examining this, very good stuff, thanks .

#77 gamefly free trial on 05.31.19 at 2:22 am

magnificent publish, very informative. I ponder why the opposite experts of this sector do not notice this.

You should proceed your writing. I am sure, you have a huge readers' base already!

#78 gamefly free trial on 05.31.19 at 12:11 pm

This is very interesting, You are a very skilled blogger.

I have joined your rss feed and look forward to seeking more of your great post.
Also, I have shared your site in my social networks!

#79 xbox one mods free download on 05.31.19 at 2:14 pm

I really enjoy examining on this internet site , it has got fine posts .

#80 gamefly free trial on 06.01.19 at 6:19 am

It's hard to come by experienced people for this subject, however, you sound like you know what you're talking
about! Thanks

#81 mpl pro on 06.01.19 at 7:26 pm

I like this site because so much useful stuff on here : D.

#82 hacks counter blox script on 06.02.19 at 7:39 am

I kinda got into this article. I found it to be interesting and loaded with unique points of view.

#83 gamefly free trial on 06.02.19 at 10:14 am

Howdy! I know this is somewhat off topic but I was wondering which blog
platform are you using for this site? I'm getting fed up of WordPress because I've had problems with hackers
and I'm looking at alternatives for another platform.

I would be awesome if you could point me in the direction of a good platform.

#84 gamefly free trial on 06.02.19 at 11:02 am

I'm very pleased to uncover this web site. I want to to thank you for your time due
to this wonderful read!! I definitely savored every bit of it and i also have you bookmarked to look at new stuff in your site.

#85 how to hacking fortnite on 06.03.19 at 11:28 am

I really enjoy examining on this web , it has got great content .

#86 gamefly free trial on 06.03.19 at 4:49 pm

Hello there! I could have sworn I've been to this website before but after checking through some of
the post I realized it's new to me. Nonetheless, I'm definitely happy I found it and
I'll be book-marking and checking back frequently!

#87 gamefly free trial on 06.03.19 at 4:50 pm

What's up colleagues, pleasant post and nice arguments commented at this place, I am actually enjoying by
these.

#88 gamefly free trial on 06.05.19 at 10:07 pm

Hi my family member! I want to say that this post
is amazing, great written and come with almost all significant infos.
I would like to peer more posts like this .

#89 gamefly free trial on 06.06.19 at 3:00 pm

I loved as much as you will receive carried out right here.
The sketch is tasteful, your authored material stylish. nonetheless, you command get got an edginess over that you
wish be delivering the following. unwell unquestionably come further formerly again since exactly the same nearly
a lot often inside case you shield this increase.

#90 gamefly free trial on 06.07.19 at 6:17 am

Hello There. I found your blog the use of msn. That is a really smartly written article.
I'll make sure to bookmark it and return to learn more of your useful info.
Thank you for the post. I will definitely
comeback.

#91 ps4 games 2014-15 on 06.07.19 at 7:13 pm

This piece of writing is actually a nice one it assists new
net visitors, who are wishing for blogging.

#92 gamefly free trial 2019 coupon on 06.10.19 at 12:04 pm

Fastidious response in return of this query with real arguments and explaining the whole thing concerning
that.

#93 ps4 best games ever made 2019 on 06.12.19 at 7:08 pm

Wow, that's what I was looking for, what a information! existing here at this web site, thanks admin of this website.

#94 quest bars on 06.16.19 at 7:15 pm

Hey There. I discovered your blog the use of msn. This is
a very smartly written article. I will make sure to bookmark it and come back to read extra of your useful info.
Thanks for the post. I will definitely comeback.

#95 krunker aimbot on 06.17.19 at 10:11 pm

I enjoying, will read more. Cheers!

#96 proxo key generator on 06.19.19 at 5:27 pm

stays on topic and states valid points. Thank you.

#97 vn hax pubg on 06.21.19 at 1:53 am

Intresting, will come back here later too.

#98 nonsense diamond download on 06.21.19 at 2:54 pm

Just wanna input on few general things, The website layout is perfect, the articles is very superb : D.

#99 plenty of fish dating site on 06.22.19 at 3:00 am

It's amazing for me to have a web site, which is useful in favor of my know-how.
thanks admin

#100 game of dice cheats on 06.24.19 at 12:05 am

Enjoyed reading through this, very good stuff, thankyou .

#101 gx tool pro on 06.24.19 at 9:51 pm

I like this website its a master peace ! Glad I found this on google .

#102 qureka pro apk on 06.26.19 at 2:42 am

This is nice!

#103 krunker aimbot on 06.26.19 at 12:50 pm

I was looking at some of your articles on this site and I believe this internet site is really instructive! Keep on posting .

#104 ispoofer activation key on 06.27.19 at 11:29 am

Yeah bookmaking this wasn’t a risky decision outstanding post! .

#105 synapse x cracked on 06.28.19 at 3:06 am

bing bring me here. Thanks!

#106 advanced systemcare 11.5 serial on 06.28.19 at 5:33 pm

I’m impressed, I have to admit. Genuinely rarely should i encounter a weblog that’s both educative and entertaining, and let me tell you, you may have hit the nail about the head. Your idea is outstanding; the problem is an element that insufficient persons are speaking intelligently about. I am delighted we came across this during my look for something with this.

#107 how to get help in windows 10 on 06.29.19 at 2:16 am

Right now it appears like WordPress is the best blogging
platform available right now. (from what I've read) Is that what you are using on your
blog?

#108 zee 5 hack on 06.29.19 at 11:23 am

I am not rattling great with English but I get hold this really easygoing to read .

#109 cryptotab balance hack script v1.4 cracked by cryptechy03 on 06.29.19 at 5:49 pm

very nice post, i actually enjoyed this web site, carry on it

#110 pokemon mirage of tales on 07.01.19 at 1:34 pm

Me enjoying, will read more. Thanks!

#111 tinyurl.com on 07.01.19 at 3:41 pm

My brother suggested I might like this website. He was totally
right. This post truly made my day. You cann't imagine
simply how much time I had spent for this info!
Thanks!

#112 codes for mining simulator 2019 on 07.02.19 at 12:24 am

Appreciate it for this howling post, I am glad I observed this internet site on yahoo.

#113 nonsense diamond on 07.02.19 at 5:07 pm

Enjoyed reading through this, very good stuff, thankyou .

#114 vn hax download on 07.03.19 at 11:35 am

stays on topic and states valid points. Thank you.

#115 cyberhackid on 07.03.19 at 11:33 pm

Great, this is what I was looking for in google

#116 xforce keygen on 07.04.19 at 11:36 am

Great, yahoo took me stright here. thanks btw for post. Cheers!

#117 what is seo simple on 07.04.19 at 3:13 pm

Parasite backlink SEO works well :)

#118 phantom forces aimbot on 07.04.19 at 11:38 pm

Intresting, will come back here again.

#119 open dego file on 07.05.19 at 12:20 pm

Hi, i really think i will be back to your website

#120 erdas foundation 2015 on 07.06.19 at 12:16 am

stays on topic and states valid points. Thank you.

#121 synapse x on 07.06.19 at 9:44 am

Cheers, here from yanex, me enjoyng this, I come back again.

#122 gx tool uc hack on 07.06.19 at 4:24 pm

Ni hao, i really think i will be back to your website

#123 black ops 4 license key free on 07.07.19 at 2:53 pm

You got yourself a new follower.

#124 spyhunter 5.4.2.101 key on 07.08.19 at 4:20 pm

I was looking at some of your articles on this site and I believe this internet site is really instructive! Keep on posting .

#125 quest bars cheap 2019 coupon on 07.09.19 at 6:46 am

Hmm is anyone else experiencing problems with the pictures on this blog loading?
I'm trying to find out if its a problem on my end or if it's the blog.
Any responses would be greatly appreciated.

#126 roblox fps unlocker on 07.09.19 at 6:33 pm

This is nice!

#127 quest bars on 07.11.19 at 7:52 am

Thanks for a marvelous posting! I seriously enjoyed reading it, you might be a great
author. I will always bookmark your blog and will eventually come back in the foreseeable future.
I want to encourage continue your great writing, have a nice
afternoon!

#128 sissy_neri on 07.14.19 at 2:00 am

Thank you for the great read!

#129 czech cam girls on 07.15.19 at 2:11 am

some great ideas this gave me!

#130 plenty of fish dating site on 07.15.19 at 4:41 pm

I don't even understand how I ended up right here, but I
assumed this publish was good. I don't know who
you might be however definitely you're going to a famous blogger
for those who are not already. Cheers!

#131 legal porno on 07.15.19 at 11:57 pm

great advice you give

#132 how to get help in windows 10 on 07.17.19 at 9:43 am

Genuinely when someone doesn't be aware of after that its up to other people that they
will assist, so here it happens.

#133 plenty of fish dating site on 07.18.19 at 7:25 pm

Can I simply just say what a relief to discover someone that
genuinely understands what they are discussing on the net.
You definitely realize how to bring an issue to light
and make it important. More people should
read this and understand this side of your story.
It's surprising you're not more popular since you definitely have the gift.

#134 jeniffer on 07.19.19 at 1:46 am

just what I needed to read

#135 Otto Kufalk on 07.19.19 at 2:49 am

Mr.s Fister, this note is your next bit of data. Immediately message the agency at your convenience. No further information until next transmission. This is broadcast #3718. Do not delete.

#136 Buy Drugs Online on 07.19.19 at 2:51 am

This blog is amazing! Thank you.

#137 plenty of fish dating site on 07.19.19 at 12:32 pm

It's hard to find knowledgeable people about this subject,
however, you seem like you know what you're talking about!
Thanks

#138 how to get help in windows 10 on 07.20.19 at 10:35 am

Simply wish to say your article is as surprising.
The clarity in your post is simply great and i can assume
you're an expert on this subject. Well with your permission allow me to grab your RSS feed to keep up to date with forthcoming post.

Thanks a million and please carry on the enjoyable
work.

#139 [prodigy hack] on 07.21.19 at 9:06 pm

I love reading through and I believe this website got some genuinely utilitarian stuff on it! .

#140 natalielise on 07.23.19 at 7:48 am

Hello! Someone in my Myspace group shared this website with us
so I came to look it over. I'm definitely enjoying the information. I'm bookmarking and will be
tweeting this to my followers! Outstanding blog and excellent
style and design. natalielise plenty of fish

#141 acidswapper on 07.23.19 at 8:48 pm

Thanks for this article. I definitely agree with what you are saying.

#142 date vougar on 07.23.19 at 10:41 pm

I am 43 years old and a mother this helped me!

#143 date c0ugar on 07.23.19 at 11:26 pm

I am 43 years old and a mother this helped me!

#144 date cpugar on 07.23.19 at 11:43 pm

I am 43 years old and a mother this helped me!

#145 dat3e cougar on 07.23.19 at 11:59 pm

I am 43 years old and a mother this helped me!

#146 plenty of fish dating site on 07.24.19 at 7:26 am

Excellent write-up. I certainly love this site. Stick with it!

#147 online security key for farm simulator 19 on 07.24.19 at 8:53 pm

I really enjoy examining on this blog , it has got great posts .

#148 ezfrags on 07.26.19 at 12:10 am

Just wanna input on few general things, The website layout is perfect, the articles is very superb : D.

#149 smore.com on 07.26.19 at 4:30 am

Hey I am so thrilled I found your blog page, I really found you by accident, while I was researching on Bing for something else,
Regardless I am here now and would just like to say thank you for a tremendous post and a all round
thrilling blog (I also love the theme/design), I don’t have time to
browse it all at the moment but I have bookmarked it and also included
your RSS feeds, so when I have time I will be back to read much more, Please do keep up
the awesome work. plenty of fish natalielise

#150 ezfrags on 07.27.19 at 1:31 am

very Great post, i actually love this web site, carry on it