Optimal processor size

I'm going to argue that high-performance chip designs ought to use a (relatively) modest number of (relatively) strong cores. This might seem obvious. However, enough money is spent on developing the other kinds of chips to make the topic interesting, at least to me.

I must say that I understand everyone throwing millions of dollars at hardware which isn't your classic multi-core design. I have an intimate relationship with multi-core chips, and we definitely hate each other. I think that multi-core chips are inherently the least productive programming environment available. Here's why.

Our contestants are:

  • single-box, single-core
  • single-box, multi-core
  • multi-box

With just one core, you aren't going to parallelize the execution of anything in your app just to gain performance, because you won't gain any. You'll only run things in parallel if there's a functional need to do so. So you won't have that much parallelism. Which is good, because you won't have synchronization problems.

If you're performance-hungry enough to need many boxes, you're of course going to parallelize everything, but you'll have to solve your synchronization problems explicitly and gracefully, because you don't have shared memory. There's no way to have a bug where an object happens to be accessed from two places in parallel without synchronization. You can only play with data that you've computed yourself, or that someone else decided to send you.

If you need performance, but for some reason can't afford multiple boxes (you run on someone's desktop or an embedded device), you'll have to settle for multiple cores. Quite likely, you're going to try to squeeze every cycle out of the dreaded device you have to live with just because you couldn't afford more processing power. This means that you can't afford message passing or a side-effect-free environment, and you'll have to use shared memory.

I'm not sure about there being an inherent performance impact to message passing or to having no side effects. If I try to imagine a large system with massive data structures implemented without side effects, it looks like you have to create copies of objects at the logical level. Of course, these copies can then be optimized out by the implementation; I just think that some of the copies will in fact be implemented straight-forwardly in practice.

I could be wrong, and would be truly happy if someone explained to me why. I mean, having no side effects helps analyze the data flow, but the language is still Turing-complete, so you don't always know when an object is no longer needed, right? So sometimes you have to make a new object and keep the old copy around, just in case someone needs it, right? What's wrong with this reasoning? Anyway, today I'll assume that you're forced to use mutable shared memory in multi-core systems for performance reasons, and leave this no-side-effects business for now.

Summary: multiple cores is for performance-hungry people without a budget for buying computational power. So they end up with awful synchronization problems due to shared memory mismanagement, which is even uglier than normal memory mismanagement, like leaks or dangling references.

Memory mismanagement kills productivity. Maybe you disagree; I won't try to convince you now, because, as you might have noticed, I'm desperately trying to stay on topic here. And the topic was that multi-core is an awful environment, so it's natural for people to try to develop a better alternative.

Since multi-core chips are built for anal-retentive performance weenies without a budget, the alternative should also be a high-performance, cheap system. Since the clock frequency doesn't double as happily as it used to these days, the performance must come from parallelism of some sort. However, we want to remove the part where we have independent threads accessing shared memory. What we can do is two things:

  • Use one huge processor.
  • Use many tiny processors.

What does processor "size" have to do with anything? There are basically two ways of avoiding synchronization problems. The problems come from many processors accessing shared memory. The huge processor option doesn't have many processors; the tiny processors option doesn't have shared memory.

The huge processor would run one thread of instructions. To compensate for having just one processor, each instruction would process a huge amount of data, providing the parallelism. Basically we'd have a SIMD VLIW array, except it would be much much wider/deeper than stuff like AltiVec, SSE or C6000.

The tiny processors would talk to their neighbor tiny processors using tiny FIFOs or some other kind of message passing. We use FIFOs to eliminate shared memory. We make the processors tiny because large processors are worthless if they can't process large amounts of data, and large amounts of data mean lots of bandwidth, and lots of bandwidth means memory, and we don't want memory. The advantage over the SIMD VLIW monster is that you run many different threads, which gives more flexibility.

So it's either huge or tiny processors. I'm not going to name any particular architecture, but there were and still are companies working on such things, both start-ups and seasoned vendors. What I'm claiming is that these options provide less performance per square millimeter compared to a multi-core chip. So they can't beat multi-core in the anal-retentive performance-hungry market. Multiple cores and the related memory mismanagement problems are here to stay.

What I'm basically saying is, for every kind of workload, there exists an optimal processor size. (Which finally gets me to the point of this whole thing.) If you put too much stuff into one processor, you won't really be able to use that stuff. If you don't put enough into it, you don't justify the overhead of creating a processor in the first place.

When I think about it, there seems to be no way to define a "processor" in a "universal" way; a processor could be anything, really. Being the die-hard von-Neumann machine devotee that I am, I define a processor as follows:

  • It reads, decodes and executes an instruction stream (a "thread")
  • It reads and writes memory (internal and possibly external)

This definition ignores at least two interesting things: that the human brain doesn't work that way, and that you can have hyper-threaded processors. I'll ignore both for now, although I might come back to the second thing some time.

Now, you can play with the "size" of the processor – its instructions can process tiny or huge amounts of data; the local memory/cache size can also vary. However, having an instruction processing kilobytes of data only pays off if you can normally give the processor that much data to multiply. Otherwise, it's just wasted hardware.

In a typical actually interesting app, there aren't that many places where you need to multiply a zillion adjacent numbers at the same cycle. Sure, your app does need to multiply a zillion numbers per second. But you can rarely arrange the computations in a way meeting the time and location constraints imposed by having just one thread.

I'd guess that people who care about, say, running a website back-end efficiently know exactly what I mean; their data is all intertwined and messy, so SIMD never works for them. However, people working on number crunching generally tend to underestimate the problem. The abstract mental model of their program is usually much more regular and simple in terms of control flow and data access patterns than the actual code.

For example, when you're doing white board run time estimations, you might think of lots of small pieces of data as one big one. It's not at all the same; if you try to convince a huge SIMD array that N small pieces of data are in fact one big one, you'll get the idea.

For many apps, and I won't say "most" because I've never counted, but for many apps, data parallelism can only get you that much performance; you'll need task parallelism to get the rest. "Task parallelism" is when you have many processors running many threads, doing unrelated things.

One instruction stream is not enough, unless your app is really (and not theoretically) very simple and regular. If you have one huge processor, most of it will remain idle most of the time, so you will have wasted space in your chip.

Having ruled out one huge processor, we're left with the other extreme – lots of tiny ones. I think that this can be shown to be inefficient in a relatively intuitive way.

Each time you add a "processor" to your system, you basically add overhead. Reading and decoding instructions and storing intermediate results to local memory is basically overhead. What you really want is to compute, and a processor necessarily includes quite some logic for dispatching computations.

What this means is that if you add a processor, you want it to have enough "meat" for it to be worth the trouble. Lots of tiny processors is like lots of managers, each managing too few employees. I think this argument is fairly intuitive, at least I find it easy to come up with a dumb real-world metaphor for it. The huge processor suffering from "lack of regularity" problems is a bit harder to treat this way.

Since a processor has an "optimal size", it also has an optimal level of performance. If you want more performance, the optimal way to get it is to add more processors of the same size. And there you have it – your standard, boring multi-core design.

Now, I bet this would be more interesting if I could give figures for this optimal size. I could of course shamelessly weasel out of this one – the optimal size depends on lots of things, for example:

  • Application domain. x86 runs Perl; C6000 runs FFT. So x86 has speculative execution, and C6000 has VLIW. (It turns out that I use the name "Perl" to refer to all code dealing with messy data structures, although Python, Firefox and Excel probably aren't any different. The reason must be that I think of "irregular" code in general and Perl in particular as a complicated phenomenon, and a necessary evil).
  • The cost of extra performance. Will your customer pay extra 80% for extra 20% of performance? For an x86-based system, the answer is more likely to be "yes" than for a C6000-based system. If the answer is "yes", adding hardware for optimizing rare use cases is a good idea.

I could go on and on. However, just for the fun of it, I decided to share my current magic constants with you. In fact there aren't many of them – I think that you can use at most 8-16 of everything. That is:

  • At most 8-16 vector elements for SIMD operations
  • At most 8-16 units working in parallel for VLIW machines
  • At most 8-16 processors per external memory module

Also, 8-16 of any of these is a lot. Many good systems will use, say, 2-4 because their application domain is more like Perl than FFT in some respect, so there's no chance of actually utilizing that many resources in parallel.

I have no evidence that the physical constants of the universe make my magic constants universally true. If you know about a great chip that breaks these "rules", it would be interesting to hear about it.


#1 Alex on 05.29.08 at 6:41 am

Why is it necessarily so that for a single machine, we're stuck with having threads running in a single process trying their best not to stomp on each other?

Can't one use the page table mechanism to pass ownership of pages between processes as needed? (I'm not sure whether this would qualify as message passing or shared memory, but the effect would be that a single process would have access to a page/pages at any time).

I can't imagine the code to change ownership of a page to have significantly greater performance penalty than taking a lock.

Of course there is no OS support for something like this, but why should we take the current OS model as dogma and not look for improvements?

#2 Yossi Kreinin on 05.31.08 at 7:31 am

I think it would basically be shared memory, with the twist of making pages "owned" by one of the processors.

Basically the benefit of explicit memory sharing between processes is that you don't have race conditions in places you didn't even think about, and the problem is that it makes sharing memory harder. I think that adding ownership transfer would make both the benefit and the problem bigger (you have even less bugs due to poorly thought out code and even more coding overhead for sharing memory).

#3 Antiguru on 11.06.09 at 7:58 pm

Hmm. Well, I have done quite a lot of this and it seems to be an area I do well in. The only issue is between environments and every time you move to new one you have to 'solve' it before anything works at all but then it's fine. The biggest problem is you always WORRY it's the MT aspect causing trouble in some way you don't quite understand but 99.9% of the time it's not the issue.

The multicore aspect is not a problem, except maybe it might hide some bugs. Fortunately for PCs the memory model is forgiving. Fast processors are nice but I think the issue is they can't make them any faster any more, as far as clock speeds go. The extra core is really just some special cache to make for quick context switching so you get a lot of bang for almost no buck in that respect.

Not sure why you don't like shared memory, it's the best way to do virtually everything MT. Message passing is generally introduces a lot of issues such as not working most of the time or stopping working every time you change platform or compiler or hardware.

The shared memory stuff is complex but that's on the hardware maker, ie intel. ANd, when you break it down it's all simple parts, basically cache layers that ultimately figure out what's what in the proper way and reset things if they need to be reset. That part intel does pretty well, their TBB nonsense is an utter joke of course.

Whn you look at processors, the reason tiny ones make sense is this: the actual processor part that DOES anything is already tiny. All the other crap is just there to keep it fed at all times. They've been doing MT for years in a very halfassed way – IE instruction pipelines etc. That's 90% of the processor.

So if you can get 10 processors in the same place, suddenly the overhead of wiring them together makes sense. I mean, they have 480 (powerful) processors on newer graphics cards. Larrabee with have 32 or 64.

Even if you can't parallelize things at all, every PC has dozens of crap running at all time nowadays. None of it requires tons of processing, but if the context switching and scheduling can be taken off the CPU and put onto side CPUs you get a big improvement in usability.

For things like rendering, skinning, maya, blah blah blah you get 100% parallelization. Unless the programmers are morons.

So, it's basically bound for lots of tiny processors. Nothing else makes a ton of sense.

#4 Antiguru on 11.06.09 at 8:03 pm

And as for magic constant, there isn't one. It's all on your parallelization skill.

The reason people get choked after N where N is some tiny number is because they use locks or they preprocess huge amounts in just one thread, ie have no clue what they are doing.

Not everything can be properly broken up but for things that can be all you have to do is put jobs on the queue with various index ranges you want computed.

#5 Yossi Kreinin on 11.07.09 at 2:50 am

Actually if you look at Nvidia's machines, you'll see constants around 8, 16 or 32. The hundreds of cores they cite (between 256 and 512) are really tens of cores, each with a SIMD array. They call it SIMT and in fact it isn't quite the traditional SIMD, but it's still a single instruction dispatched to 16 execution units.

Regarding shared memory – its assessment will depend on how many dozens of developers are using it in how many hundreds of thousands of lines of code.

#6 Yossi Kreinin on 11.07.09 at 2:52 am

…Also, Nvidia has ~6 external DRAMs in its high-end variants so the 32 SIMT cores are actually <6 cores per external memory module. I still believe the constants above have some magic to them.

#7 Antiguru on 12.03.09 at 4:47 am

AH, I see what you are saying about that now.

#8 he/she on 09.13.10 at 6:45 pm

you know what

#9 Yossi Kreinin on 09.13.10 at 11:47 pm

Actually, I don't. Is this some kind of spam for spam's sake, like those messages with the subject "qz" and the body "ygdf4c" that I sometimes get – sneak past the spam filter but don't seem to yield any benefits for the spammer except satisfaction at having wasted someone's time?

#10 istripper crack on 05.15.19 at 8:12 pm

I was looking at some of your articles on this site and I believe this internet site is really instructive! Keep on posting .

#11 fortnite aimbot download on 05.16.19 at 1:21 pm

I kinda got into this web. I found it to be interesting and loaded with unique points of view.

#12 aimbot fortnite on 05.16.19 at 5:15 pm

Cheers, i really think i will be back to your website

#13 nonsense diamond on 05.17.19 at 7:31 am

Enjoyed reading through this, very good stuff, thankyou .

#14 fallout 76 hacks on 05.17.19 at 10:55 am

I really enjoy examining on this web , it has got cool article .

#15 red dead redemption 2 digital key resale on 05.17.19 at 4:05 pm

Deference to op , some superb selective information .

#16 redline v3.0 on 05.17.19 at 7:09 pm

I’m impressed, I have to admit. Genuinely rarely should i encounter a weblog that’s both educative and entertaining, and let me tell you, you may have hit the nail about the head. Your idea is outstanding; the problem is an element that insufficient persons are speaking intelligently about. I am delighted we came across this during my look for something with this.

#17 Rosalina Roll on 05.18.19 at 6:27 am

5/17/2019 In my estimation, yosefk.com does a great job of handling issues like this! Even if often intentionally contentious, the posts are more often than not thoughtful and challenging.

#18 badoo superpowers free on 05.18.19 at 8:35 am

Morning, i really think i will be back to your website

#19 forza horizon 4 license key on 05.18.19 at 3:26 pm

Yeah bookmaking this wasn’t a risky decision outstanding post! .

#20 mining simulator 2019 on 05.19.19 at 7:28 am

Ha, here from yahoo, this is what i was browsing for.

#21 smutstone on 05.20.19 at 12:07 pm

I dugg some of you post as I thought they were very beneficial invaluable

#22 redline v3.0 on 05.21.19 at 7:39 am

I consider something really special in this site.

#23 free fire hack version unlimited diamond on 05.21.19 at 4:57 pm

I like this site, some useful stuff on here : D.

#24 nonsense diamond on 05.22.19 at 6:47 pm

I conceive this web site holds some real superb information for everyone : D.

#25 krunker aimbot on 05.23.19 at 7:06 am

Good, this is what I was searching for in google

#26 bitcoin adder v.1.3.00 free download on 05.23.19 at 10:45 am

Your article has proven useful to me.

#27 vn hax on 05.23.19 at 7:28 pm

I kinda got into this post. I found it to be interesting and loaded with unique points of view.

#28 eternity.cc v9 on 05.24.19 at 8:17 am

bing brought me here. Thanks!

#29 ispoofer pogo activate seriale on 05.24.19 at 6:49 pm

Enjoyed reading through this, very good stuff, thankyou .

#30 cheats for hempire game on 05.26.19 at 6:53 am

bing brought me here. Thanks!

#31 iobit uninstaller 7.5 key on 05.26.19 at 9:38 am

I was looking at some of your articles on this site and I believe this internet site is really instructive! Keep on posting .

#32 smart defrag 6.2 serial key on 05.26.19 at 4:04 pm

Your site has proven useful to me.

#33 resetter epson l1110 on 05.26.19 at 6:51 pm

Yeah bookmaking this wasn’t a risky decision outstanding post! .

#34 sims 4 seasons code free on 05.27.19 at 8:08 am

yahoo brought me here. Thanks!

#35 rust hacks on 05.27.19 at 8:35 pm

I conceive this web site holds some real superb information for everyone : D.

#36 strucid hacks on 05.28.19 at 10:53 am

This is awesome!

#37 expressvpn key on 05.28.19 at 7:55 pm

I conceive you have mentioned some very interesting details , appreciate it for the post.

#38 gamefly free trial on 05.29.19 at 4:01 am

At this time it looks like WordPress is the preferred blogging platform out
there right now. (from what I've read) Is that what
you are using on your blog?

#39 ispoofer license key on 05.29.19 at 9:13 am

This is nice!

#40 aimbot free download fortnite on 05.29.19 at 1:13 pm

I dugg some of you post as I thought they were very beneficial invaluable

#41 redline v3.0 on 05.29.19 at 5:38 pm

Deference to op , some superb selective information .

#42 vn hax on 05.30.19 at 6:55 am

Enjoyed examining this, very good stuff, thanks .

#43 gamefly free trial on 05.31.19 at 5:31 am

This paragraph will help the internet people for creating new blog or even a
weblog from start to end.

#44 xbox one mods free download on 05.31.19 at 1:27 pm

Hey, happy that i found on this in bing. Thanks!

#45 fortnite aimbot download on 05.31.19 at 4:10 pm

Great stuff to see, glad that google took me here, Keep Up awsome job

#46 mpl pro on 06.01.19 at 6:54 pm

I am glad to be one of the visitors on this great website (:, appreciate it for posting .

#47 gamefly free trial on 06.01.19 at 10:38 pm

When someone writes an post he/she maintains the idea of a
user in his/her mind that how a user can understand it. Therefore that's why this
post is great. Thanks!

#48 hacks counter blox script on 06.02.19 at 7:03 am

I am glad to be one of the visitors on this great website (:, appreciate it for posting .

#49 roblox executor on 06.03.19 at 10:53 am

This does interest me

#50 gamefly free trial on 06.05.19 at 12:23 am

There is certainly a lot to find out about this subject.
I like all the points you've made.

#51 Homer Gillie on 06.08.19 at 12:13 am

Kudos for the great blog you've created at yosefk.com. Your enthusiasm is certainly inspiring. Thanks again!

#52 bit.ly on 06.10.19 at 8:08 am

What's up to every one, the contents present at this site are
genuinely awesome for people knowledge, well, keep up the nice work fellows.

#53 ps4 best games ever made 2019 on 06.12.19 at 11:29 am

At this time I am going to do my breakfast, after having
my breakfast coming over again to read other news.

#54 ps4 best games ever made 2019 on 06.12.19 at 6:17 pm

I've been exploring for a bit for any high quality
articles or weblog posts on this sort of space .
Exploring in Yahoo I finally stumbled upon this web
site. Reading this information So i am glad to exhibit that I've an incredibly excellent uncanny feeling I found out
exactly what I needed. I so much undoubtedly will
make certain to don?t put out of your mind this web site and give it a glance regularly.

#55 quest bars cheap on 06.14.19 at 6:12 pm

For the reason that the admin of this website is working, no doubt very soon it will be
famous, due to its feature contents.

#56 quest bars cheap on 06.15.19 at 8:28 am

I do accept as true with all the ideas you have presented to your post.
They're really convincing and will definitely work.
Nonetheless, the posts are very brief for beginners.
May you please prolong them a bit from subsequent time? Thanks
for the post.

#57 quest bars on 06.16.19 at 2:38 pm

I've been exploring for a little bit for any high-quality articles or
weblog posts in this kind of space . Exploring in Yahoo
I at last stumbled upon this web site. Reading this info So i
am glad to exhibit that I've a very excellent uncanny
feeling I found out just what I needed. I most surely will make sure
to don?t omit this website and provides it a glance on a continuing basis.

#58 fortnite aimbot free on 06.17.19 at 11:56 am

Very interesting points you have remarked, appreciate it for putting up.

#59 tinyurl.com on 06.17.19 at 4:48 pm

You have made some really good points there. I looked on the internet for additional information about the issue
and found most people will go along with your views on this website.

#60 proxo key generator on 06.19.19 at 1:48 pm

Respect to website author , some wonderful entropy.

#61 vn hax download on 06.20.19 at 10:22 pm

Hi, here from bing, me enjoyng this, will come back again.

#62 nonsense diamond key on 06.21.19 at 11:26 am

This is great!

#63 quest bars cheap on 06.23.19 at 2:13 pm

Wow, awesome blog layout! How long have you been blogging for?

you made blogging look easy. The overall look of your website is fantastic,
as well as the content!

#64 game of dice cheats on 06.23.19 at 8:46 pm

I love reading through and I believe this website got some genuinely utilitarian stuff on it! .

#65 gx tool pro apk download on 06.24.19 at 6:43 pm

I like this page, because so much useful stuff on here : D.

#66 geometry dash 2.11 download on 06.25.19 at 11:25 pm

Yeah bookmaking this wasn’t a risky decision outstanding post! .

#67 krunker aimbot on 06.26.19 at 9:54 am

I simply must tell you that you have an excellent and unique website that I must say enjoyed reading.

Leave a Comment