Redundancy vs dependencies: which is worse?

I believe that there are just two intrinsic forces in programming:

  1. You want to minimize redundancy and, ideally, define every piece of knowledge once.
  2. You want to minimize dependencies – A should depend on B only if it absolutely must.

I think that all other considerations are of the extrinsic real-world kind – domain modeling, usability, schedules, platforms, etc. I also think that I can show how any "good" programming practice is mainly aimed at minimizing redundancy, dependencies, or both. I even think that you can tell a "good" programmer from a "bad" one by their attitude towards redundancy and dependencies. The good ones hate them, the bad ones don't care.

If this idea looks idiotically oversimplified, note that I mean "programming aptitude" in a narrow sense of code quality. I've seen brilliant, cooperative people with uncanny algorithmic capabilities who still wrote awful code. I tried to figure it out and the common denominator seemed to be that they just didn't care about redundancy or dependencies, or even kinda liked them. Maybe it still looks idiotically oversimplified. Let's leave it at that, because it's not what I'm here to talk about.

I'm here to talk about the case when minimizing redundancy conflicts with minimizing dependencies. This case is basically code reuse beyond module boundaries. You can choose between having modules A and B using a module C doing something, or have them do it themselves. What's your call?

One strikingly dumb thing about this question is that it's centered around the term "module", which is vague and informal. However, "module" is what makes this a trade-off. Inside a module, of course you want to reuse the code, end of discussion. Why would anyone want to parse two command line options with two duplicated code snippets when you could use a function?

On the other hand, if two modules parse command lines, we can still factor out the parsing code, but we'd have to make it a third module. Alternatively, we can stuff it into a "utilities" module. The one affectionately called "the trash can". The one which won't link without a bunch of external libraries used to implement some of its handy functions. The one with the configuration which always gets misconfigured, and the initialization which never happens at the right time. You know, the utilities module.

I believe that years of experience barely matter in terms of knowledge. You don't learn at work at a pace anywhere near that of a full-time student. Experience mainly does two things: it builds character, and it destroys character. Case in point: young, passionate programmers are usually very happy to make the third module, nor do they cringe when they delve into the utility trash can. They're then understandably offended when their more seasoned colleagues, having noticed their latest "infrastructural" activity, reach out for the barf bags. This certainly illustrates either the character-building or the character-destroying power of experience, I just don't know which one.

No, seriously. Take command line parsing. You want common syntax for options, right? And you want some of them to accept values, right? And those values can be strings, and booleans, and integers, right? And integers can be decimal or hexadecimal, right? And they can be values of user-defined types, right? And they can have help strings, right? And you'd like to generate help screens from them, right? And GUIs with property pages? And read them from configuration files? And check the legality of flags or sets of flags, right?

Sure. It's not a big deal. Trivial, even. (If you're smart, everything is trivial until you fail completely due to exceeding complexity. And admit that you failed due to exceeding complexity. The former takes time to happen, the latter can never happen.) Quite some people have devoted several of the beautiful months of their youth to the problem of argument passing. Example: XParam, which calls itself "The Solution to Parameter Handling". Took >10K LOC the last time I checked. Comes with its own serialization framework. Rumors tell that its original host project uses <5% of its features.

Clarification: I'm not mocking the authors of XParam. Reason 1: Rumors tell they are pretty sharp. Reason 2: I'm really, really ashamed to admit this, but I once worked on a logging library called XLog. Took >10K LOC the last time I counted. Came with its own serialization framework. First-hand evidence tells that its host project uses 0% of its features. Ouch.

You know how I parse command line arguments in my modules these days? Like so:

for(i=0; i<argc; ++i) {
  if(strcmp(argv[i],"-trace")==0) {
    trace=1;
  }
}

I used C for the example because it's the ugliest language for string processing, and it's still a piece of cake. No, I don't get help screens. No, I don't get proper command line validation. So sue me. They're debugging options. It's good enough. Beats having everything depend on a 10K LOC command line parsing package. Not to mention a 50K LOC utility trash can full of toxic waste.

Modules are important. Module boundaries are important. A module is a piece of software that has:

  1. A reasonably compact, stable interface. An unfortunate side effect of OO training is that compact interfaces aren't valued. It's considered OK to expose an object model with tens of classes and hairy data structures and poorly thought-out extensibility hooks. Furthermore, an unfortunate side effect of C++ classes is that "stable interface" is an oxymoron. But no matter; nothing prevents you from implementing a compact, stable interface.
  2. Documentation. The semantics of the compact and stable interface are described somewhere. A really good module comes with example code. "Internal" modules lacking reasonably complete documentation suck, although aren't always avoidable.
  3. Tests. I don't think unit-testing each class or function makes sense. What has to be tested is the "official" module interfaces, because if they work, the whole module can be considered working, and otherwise, it can't.
  4. Reasonable size. A module should generally be between 1K and 30K LOC (the numbers are given in C++ KLOC units; for 4GLs, divide them by 4). Larger modules are a pile of mud. A system composed of a zillion tiny modules is itself a pile of mud.
  5. Owner. The way to change a module is to convince the owner to make a change. The way to fix a bug is to report it to the owner. That way, you can count on the interface to be backed up by a consistent mental model making it work. The number of people capable of simultaneously maintaining this kind of mental model was experimentally determined to be 1.
  6. Life cycle. Except for emergency bug fixes, changes to a module are batched into versions, which aren't released too frequently. Otherwise you can't have a tested, stable interface.

Pretty heavy.

Do I want to introduce a module to handle command line parsing? Do I really wish to become the honored owner of this bleeding-edge technology? Not now, thank you. Of course, it's the burnout speaking; it would be fun and it would be trivial, really. Luckily, not everyone around is lazy and grumpy like me. See? That guy over there already created a command line parsing module. And that other person here made one, too. Now, do I want my code to depend on their stuff?

Tough question. I've already compromised my reputation by refusing to properly deal with command line parsing. If my next move is refusing to use an Existing Solution, I will have proved the antisocial nature of my personality. Where's my team spirit? And still, I have my doubts. Is this really a module?

First and foremost, and this is the most annoying of all questions – who owns this piece of work? Sure, you thought it was trivial and you hacked it up. But are you willing to support it, or do you have someone in mind you'd like to transfer the ownership to? You probably don't even realize that it has to be supported, since it's so trivial. Oh, but you will. You will realize that it has to be supported.

I'm enough of a Kassandra to tell exactly when you'll realize it. It will be when the first completely idiotic change is made to your code by someone else. Not unlikely, tying it to another piece of "infrastructure", leading to a Gordian knot of dependencies. Ever noticed how infrastructure lovers always believe their module comes first in the dependency food chain and how it ultimately causes cyclic dependencies? So anyway, then, you'll understand. Of course, it will be too late as far as I'm concerned. My code got tied to yours, theirs, and everybody else's sticky infrastructure. Oopsie.

(The naive reader may ask, what can a command line parser possibly depend on? Oh, plenty of stuff. A serialization package. A parsing package. A colored terminal I/O package, for help screens. C++: a platform-specific package for accessing argc, argv before main() is called. C++: a singleton initialization management package. WTF is that, you ask? Get a barf bag and check out what Modern C++ Design has to say on the subject).

So no, I don't want to depend on anything without an owner. I see how this reasoning can be infuriating. Shifting the focus from software to wetware is a dirty trick loved by technically impotent pseudo-business-oriented middle-management loser types. Here's my attempt at distinguishing myself from their ilk: not only do I want to depend on stuff with an owner, but I require a happy owner at that. Contrary to a common managerial assumption (one of those which rarely hold but do keep managers sane), I don't believe in forcibly assigning ownership. If the owner doesn't like the module, expect some pretty lousy gardening job.

What about life cycle? My modules and your modules are released at different times. I might run into a need to check for compatibility with different versions of your stuff. My tests won't compile without your code, so now I need to always have it in my environment. What does your code depend on – is it a small, stable, defined set of things? I don't want to be stuck just because your new version drags in more dependencies. What if I need a feature? You don't seem to support floating point numbers. And I don't see abbreviations, either; I need them for a bunch of flags 'cause they're passed interactively all the time.

What about stable interface and semantics? Your previous version would accept command lines passing the same option multiple times, and take the last value. I have test scripts that count on that. Your new version reports an error, because now this syntax is reserved for lists (-frob=a -frob=b -frob=c passes the list a,b,c as the value of the frob option). Sigh. I guess it could be worse – you could make the string "a,b,c" from this command line, and then the problem would propagate deeper.

I could go on and on, about how your interface isn't a decent public interface (you don't seriously expect me to subclass EnumParser to handle flags with a fixed set of legal string values, do you?). Or about the size of your code, which at the moment dwarfs the size of my own module, so I'd have more command line parsing code than anything else in my tests. And how it hurts when you download the tests using a slow connection to the target machine. And how I don't like it when my tests crash inside your code, even when it's my fault, because you have hairy data structures that I don't like to inspect.

But you already got it – I don't want your code, because I'm an antisocial asshole that has no team spirit whatsoever. I'm going to parse arguments using 5 lines of C code. Worse, I'll make a function out of those five lines, prefix its name with the module name, and replicate it in all my modules. Yep, I'm the copy-paste programmer and you're the enlightened developer of the next generation command line parsing platform. Have it your way, and I'll have it my way.

To complete the bad impression I've been making here, I'll use a flawed real-world analogy. Think living organisms. They have a very serious Tower of Babel problem; redundancy is everywhere. I've heard that humans and octopuses have very similar eye structure, despite being descendant from a blind ancestor. Command line parsers?! Entire eyes, not the simplest piece of hardware with quite some neural complexity at the back-end, get developed independently. Redundancy can hardly get any worse. But it works, apparently better than coordinating everybody's efforts to evolve would work.

Redundancy sucks. Redundancy always means duplicated efforts, and sometimes interoperability problems. But dependencies are worse. The only reasonable thing to depend on is a full-fledged, real module, not an amorphous bunch of code. You can usually look at a problem and guess quite well if its solution has good chances to become a real module, with a real owner, with a stable interface making all its users happy enough. If these chances are low, pick redundancy. And estimate those chances conservatively, too. Redundancy is bad, but dependencies can actually paralyze you. I say – kill dependencies first.

24 comments ↓

#1 Entity on 05.27.08 at 5:21 pm

One thing you didn't touch on, even with redundancy in your code it does future proof your code. Where by dependence on a central library file being COM or DLL can break and disable your application long since development work has finished.

Similar attitude I've read about from the Microsoft Excel camp was an old saying "Find the dependencies and eliminate them". Though, since then the Microsoft Excel team has since gone through so many developers I don't think that comment really reflects the new Microsoft Excel team.

Would like to add, that I agree with you that it comes down to experience. Choosing and weighing up the dependencies and redundancy at the start of the project can in most cases mean the difference between finishing the project on time or overshoot it. Also your perspective and motivations.

#2 anthonyrstevens on 05.27.08 at 6:16 pm

Fantastic post. I wish more developers had this kind of wisdom!

#3 Programmers: Go Read Proper Fixation « The Pursuit of a Life on 05.27.08 at 7:41 pm

[...] here's Yossi, writing about dependencies and redundancy: But you already got it – I don’t want your code, because I’m an antisocial asshole that has no [...]

#4 Doug on 05.27.08 at 9:05 pm

Another way to look at it (at least, I think it's similar): I'm old enough to remember when the big thing wasn't "don't repeat yourself", it was "coupling and cohesion". Minimum coupling and maximum cohesion are the desired traits. Dependencies increase coupling and reduce cohesion. Bad on both counts.

Copy-and-paste still sounds evil to me, but I have to say that in my (considerable) experience it has actually caused me considerably less grief than dependencies have.

Don Knuth seems to think so too: "To me, 're-editable code' is much, much better than an untouchable black box or toolkit… you’ll never convince me that reusable code isn’t mostly a menace."

#5 ilyak on 05.28.08 at 1:01 am

Domain specific languages are good for you because a) They generally have small and consistent API, b) They generally require enough effort to make them owned and have real releases.

Also, you've usually got to have a few functions which back up deficiences of your language, which will reside in utilities, used universally and can be generally considered part of underlying platform/language.
They would also have a small stable API and would not generally have a release cycle, at all.

#6 Yossi Kreinin on 05.28.08 at 10:15 am

Regarding the Excel team as portrayed by Spolsky and Knuth: it's the "do it yourself" way; this post was more about defending "don't do it". Both have to do with minimizing dependency, but it's a different battle. "Do it yourself" is loved by strong programmers; "don't do it" is something managers and bad/evil programmers are attracted to. I can sure relate to Excel's own C compiler and Knuth's own processor though, and DIY can be hard to defend in, um, an industrial environment; I think I should talk about it some time.

Regarding future-proof code: to be fair, redundancy can make change hard, and future may force change. But still, depending on a vast array of external libraries with different life cycles is waaaay more likely to give you royal pain than redundancy, if we look at the kind of dependencies and redundancy found in code written by reasonable people. I think.

DSLs: I love them. DSLs are DIY taken to an extreme though.

Utility trash can covering up the deficiencies of your language: BARF. Works for exactly once person and becomes a nightmare otherwise. It's going to be neither small nor stable, and it will show up each time trouble surfaces. Give me one example of a language-deficiency-covering utility I want to drag with me everywhere.

(By any chance, isn't your deficiency-ridden language C++, and isn't your trash can called debug.h? Many people have a debug.h. I'm just curious.)

#7 arakyd on 05.28.08 at 11:34 pm

I think this is a really interesting topic. Some thoughts:

Butler Lampson said similar things in "Software Components: Only The Giants Survive." He seems to be of the opinion that it hasn't turned out to be cost-effective to use someone else's modules unless they are 5MLOC+: databases, programming languages, operating systems, etc. At that point rolling your own is too hard, and there is probably an organization and maybe a business plan behind those 5MLOC, so there are resources to make decent general purpose interfaces and you don't have to worry about who the owner is. Everyone uses (a different) small fraction of each of these components and loses some number of orders of magnitude of runtime efficiency in exchange for having to write less code. No one can productively use more than a few modules at any level because of inevitable impedance mismatches.

It's interesting to look at how Chuck Moore approaches this with Forth. Redundancy is eliminated in favor of dependency as much as possible, and dependencies are factored into the smallest possible groups (words). Words are grouped together in small bunches (screens), but there are no real module boundaries and the only real interface is the very simple one between words. Ideally one ends up with very little redundancy and dependencies that can easily be managed because 1) each word only has a few, and 2) the overall system is much smaller (the zillion tiny modules aren't a problem because there are really only a few hundred or so). You can have your cake and eat it too with respect to redundancy and dependency, but you pay for it by making it much harder to use any kind of external library at all.

Forth is basically a DSL constructor, and the philosophy tends to the extreme of extreme DIY. Still, think about it: at small scales I think every language and every good programmer solves the problem the same way as Forth and other truly extensible languages do: all code necessarily contains dependencies (it's dependencies all the way down to the laws of physics), we know how to manage dependency up to some level by breaking it into small pieces, minimum redundancy is theoretically zero, and redundancy increases code size and therefore necessarily adds dependencies as well (although redundant dependencies are usually not as bad). The problem occurs when 1) the code grows past the point where the programmer can handle the dependencies, or 2) the code grows past the point where the language can break all the dependencies into small groups, or 3) you have to use code with an interface that someone else has designed that has any of the problems that are mentioned above, or by Lampson.

The dominant model for scaling seems to be to try to solve everything with good modules. The result is that there are good large modules, but they all have their own complex interfaces that you have to learn and fit together. You don't get much help from the language scaling your own code, so your code size grows fairly quickly and at 1-30KLOC you feel pressure to tie it off with your own more or less ad-hoc interface. Pain scales super-linearly with the number of modules you try to glue together. It takes a lot of smart people to build large systems, and they probably write most of their own modules to minimize interface clash.

Two other approaches:

Niche languages with 2X better support for breaking dependencies into small groups: They have no libraries, so the largest systems that get built have 2*1-30KLOC. Nobody is impressed, nobody writes libraries, no one uses the language, it dies.

Forth: Chuck Moore writes sophisticated applications with it in ridiculously small amounts of code (very low redundancy, relatively low number of very small packages of dependences) and without libraries (no external software dependencies), thus (almost) completely bypassing the interface problem. No one else wants to buy a simple unified interface and drastically reduced code size at the cost of a huge impedance mismatch with 99.9% of existing code and most large components.

#8 ilyak on 05.29.08 at 3:48 am

Yossi Kreinin:
Easy!
In Java:
public static String notNull(String src) {
return (src == null ? "" : src);
}
public static boolean isEmpty(String str) {
return str == null || str.trim().length() == 0;
}
public static boolean isNotEmpty(String str) {
return !Util.isEmpty(str);
}

public static int intValue(String str, int def)
{
if(Util.isEmpty(str))
return def;
try {
return Integer.parseInt(str);
} catch(NumberFormatException nfe) {
return def;
}
}

Without them, I would DIE, my code will be much longer for all places where I would inline these helpers redundantly, and it will be exception-unsafe in every place where I will forget them (and yes, I tend to "forget" some checks if they are ugly and require a few lines per check).

#9 Entity on 05.29.08 at 4:46 pm

@Yossi Kreinin

Wouldn't you agree that this is soooo much easier to read. Its not as robust as your first example, but when you take a system wide view instead of a narrow single function such redundancy is the biggest enemy of simplicity.

public static int intValue(String str, int def)
{
try{
return Integer.parseInt(str);
} catch(NumberFormatException nfe){
return def;
}
}

#10 Yossi Kreinin on 05.30.08 at 12:04 pm

to ilyak, Entity: I could start nitpicking along the lines of "why do you treat empty and null strings as the same thing, few languages do so and for a good reason, too, *I* never have problems with that", etc. But that would miss the point; of course you might need this kind of function. I drag a bunch of those with me; for example, C++ istream doesn't have a filesize function, you need to seekg and subtract, so I have a function for that, etc.

The question is, do you drag it with you as a package that other people are supposed to use, or do you stuff it into a utility class/file you replicate in many packages/modules you work on. All I was saying is that the latter beats the former. Because if you make a "shared module" out of these few funny handy functions, you will have created a dependency problem. People will "enhance" it, add functions depending on still other utility modules, break the semantics of your functions, etc. Or if they can't do that socially, they won't reuse it, because you see, I want to try and parse the number as decimal and hexadecimal before failing and returning the default, but you only care about decimal, so I'll roll my own functions which I'd die without :) And if it won't be reused, and as you claim there's no release cycle, there seems little point in sharing it. Let those thing be the guts of a module, not a module in its own right, to make the overall picture of which modules there are and how they're related a cleaner one. It's surprising how much the cleanliness of this picture matters for the social well-being of projects; unclear dependencies and roles of modules actually confuse people a lot, despite the usual triviality of the issues.

#11 Yossi Kreinin on 05.30.08 at 12:38 pm

to arakyd: thanks for the pointer to Lampson's paper. I particularly liked the bit about "specs with teeth".

Chuck Moore is an extremist even in the Forth community. For example, most forthers, um, endorse ANS Forth, and Chuck Moore more or less hates it, because of being bloated and requiring compatibility. Here's a man who loves impedance mismatch…

I used to like Forth a lot. I now think it's The Wrong Thing. Its terseness is achieved at the cost of extreme obfuscation, which has the side effect of few opportunities for automatic optimization.

The compact programs written in Forth aren't just the result of language features. Having no variable names only gives you so much space savings, and the metaprogramming features like CREATE/DOES> aren't even in Chuck Moore's toolbox – apparently 'cause he's into simplicity and this stuff is hairy. And I think you can do with Lisp what you can do with Forth in terms of metaprogramming power, without getting cut nearly as painfully by the sharp edges.

Compact Forth programs are the result of whole philosophy of preferring simplicity to features. Unix also has this philosophy, but Forth takes it to a 10x more extreme level (how did that quote by Chuck Moore go?.. "Forth uses a different approach. No syntax. No files. No types. No operating system."). Now, I don't think that cutting features to make your life easier is the way to go. Definitely not when end user-visible features are at stake. And even developer-visible features are more important than the less competent managers would like to believe. It's not trivial to draw the line correctly, and this is why I thought my command line example would be appropriate – there, I find that the question of what features to have is experimentally found non-trivial enough to be flammable. But even Unix has too few core features, and a Forth system has waaaay to few. Which is why Windows>Unix>>Forth in terms of market share. Dead simple.

Take multiplexing. OSes essentially do multiplexing. You get the illusion of many files on top of one disk, many processes on top of one CPU and memory array, many sockets on top of one Ethernet cable, many windows on top of one screen, etc. Forth refuses to do that. How can a sane person argue for not doing that, except for saving labor for oneself? Of course many machines for the same money are better than one. You can copy and paste data between them, for example. Damn useful. On Unix, clipboards never quite worked – they weren't a core feature. Problem. Forth systems take over the machine and refuse to multiplex them. Why would I use a Forth system?

Features are a good thing.

Regarding Chuck Moore's Forth apps: I had a hardware hacker look at the (wildly incomplete) documentation of Chuck's ColorForth hardware design toolchain. You know, the alternative to the very expensive Verilog/VHDL-based toolchains. Well, the guy skimmed through it, and reported that the basic model of hardware was very simplistic and you just couldn't do with those tools what you could do with the industry-standard expensive ones. But no matter: Chuck uses the tools to produce hardware with tiny gate count. Which themselves have few features and so aren't very competitive.

I never fully believed the tails about Forth being 10x or 100x more efficient than the rest of the industry, but it took time to fully appreciate how hallucinated this claim really was.

#12 jhonan on 06.03.08 at 4:00 pm

Thanks for this. Clarifies something for me.

On Chuck Moore. Years ago I saw Chuck Moore give a talk in Sydney, showing off his PCB design system. This was astonishingly minimalist. He had three push buttons instead of a keyboard. Basically he showed a screen of characters and you cursored left/right up/down to select.

Hard core.

He didn't believe in pull-up resistors. So to determine if a line was floating or driven, he drove it low, read its value, drove it high and read the value again.

However, it wasn't all insanity. He didn't believe in floating point. I admire that in a programmer, even as I myself contribute to global warming through heating up millions of transistors.

Has the world gone insane? I really feel that your prgonostications are very valuable, but I suspect that these belong to a very, very minority view. Should you choose to expand in a job interview or application form, you'll be confronted by blank stares, if not hostility.

Programming is all orthodoxy now. Trying to make stuff work, is like, you know, hard?

Jamie

#13 Yossi Kreinin on 06.04.08 at 1:39 pm

Not believing in floating point IS insanity. To an extent making it worth to write about it some time. I've spat blood over this for a lot of time, and I'm going to share my takeaways with humanity even if it couldn't care less.

Chuck Moore kicks butt, but he is, put simply, an extremist. At its core, extremism is rooted in refusing to accept the existence of "unsolvable" problems and having to choose between two bad/suboptimal options. "Problem? LET'S FUCKING BLOW IT AWAY!!" It's a whole mindset, which I love, and quite some very talented people have it, but it's incompatible with reality. It's OK; compatibility with reality isn't everything. As long as both you and your worst natural enemies are unarmed.

Interviews are a special genre, where one key thing you want to demonstrate is compatibility. An inevitable genre, but a sucky one. Blogging is surely better than interviewing. BTW, quite shockingly, apparently a couple of quite senior managers liked this article (WHAT?), while programmers tended to like it somewhat less. Aside from the fact that I feel like having just sucked up to the authorities, this is natural. Do-it-yourself-without-dependencies-and-overgeneralizing is Goal-Oriented. "Code reuse" and "generic frameworks" Jeopardize Schedules. (I'm not making fun of the managerial perspective, just of the terms and the less competent managers.) So this particular opinion of mine isn't that counter-mainstream, on the contrary; of course it doesn't make it a good thing to say in an interview, because you shouldn't be opinionated there.

If I had to make a bet on the question whether the world or myself has gone insane, I'd bet on the second option, since it proved to be a safe bet in the past.

#14 links for 2008-06-05 on 06.04.08 at 4:35 pm

[...] Redundancy vs dependencies: which is worse? (tags: architecture design development engineering philosophy programming software) [...]

#15 Eli on 11.27.08 at 8:55 am

On one hand, I'm really glad you raise this topic. The struggle between dependencies and redundancy baffles everyone who writes large amounts of code, at one level or another. You've summarized the problem nicely.

However, I disagree with the conclusions you reach. Dependencies can be managed. Yes, they can! With judicious use of a common code repository, source control and tests dependencies can be tamed enough to bring much more merit than harm. It may take some energy to manage them, but surely less than implementing your own command line parser, text lexer, hash table, math utils, …you get the idea… for every project.

#16 Yossi Kreinin on 11.27.08 at 1:16 pm

I agree that they can be managed, and at times they should be, I'm just saying that I'm not trusting code isn't managed already as a full-fledged module.

In fact, more often than not I'm the maintainer of "shared infrastructure" code, so I'm far from suggesting to "ban" that. But experience in that made me believe Brook's numbers – something "generally usable" costs 9x-10x more development effort than something "locally usable". I think this number should be kept in mind when deciding to make something reusable, that's all; it doesn't mean "don't do it" – it can still pay off. I just hate it when people pitch their half-baked utilities as "infrastructure". I do understand their motivation though; in fact I wrote about that motivation in "The internal free market".

#17 Entity on 12.01.08 at 3:03 pm

Could you give reference to Brook’s numbers?

"
But experience in that made me believe Brook’s numbers – something “generally usable” costs 9x-10x more development effort than something “locally usable”.
"

#18 Yossi Kreinin on 12.01.08 at 3:07 pm

Sure. It's from The Mythical Man Month (a "classic" that I generally didn't like very much.)

According to Brooks, the 9x-10x was, if I remember correctly, the result of the compounding of a 3x and another 3x. The first 3x was "generalizing the program's inputs and outputs" and the other 3x was documentation and other polishing needed for "exporting" something. But I'm not sure.

#19 Recent URLs tagged Redundancy - Urlrecorder on 04.09.09 at 8:46 am

[...] recorded first by schaapy on 2009-02-24→ Redundancy vs dependencies: which is worse? [...]

#20 rarecactus on 01.30.11 at 11:45 pm

It's 2010. There is no reason to write your own command-line parsing library (or module, or framework, or whataver). Even a casual search will turn up hundred of libraries, either with LGPL or BSD licenses, that you can use in your project. libpopt is a good one. And if you don't want to link in an extra library, there's a perfectly good function called getopt() right in every libc.

The same is true for a lot of other problems. It's always best to check if there is a library that does what you want, before you rush out to do something. And don't rule out the possibility that that library might be libc itself! For example, consider using getenv() to check for some environment variables instead of writing TheWorldsBillionthConfigFileParser.

I agree with you that some programs just have excessive dependencies. It's a matter of taste, though. It's almost impossible to come up with a hard and fast "rule" about when something is worth factoring out into a library.

#21 mp on 05.21.12 at 10:33 am

Hi, just wanted to say this is an excellent post. I find it really hard to explain to people why writing programs that depend on tens of libraries are (what I will now call) 'a barf bag'. Sometimes even depending on one library can be a massive nightmare (C++ boost anyone?). But you can't say this to people because it goes against everything they have been taught about good software engineering practice and it makes you sound like you don't know what you are doing, when actually its the fact that you are enormously experienced that has led to doubts about what beginners are taught.

#22 Yossi Kreinin on 05.21.12 at 9:43 pm

Yeah, it's a good thing boost is pretty much banned where I work.

As to what's considered "good practice" – I guess a lot of people are burnt by "independence extremists", for instance, people who clone someone else's (massive) chunks of code for the sake of "independence" and drag the copy with them. I can see how one can develop an allergy for the idea of "dependency minimization".

#23 Phil on 08.13.12 at 11:14 pm

"Utility trash can covering up the deficiencies of your language: BARF. Works for exactly once person and becomes a nightmare otherwise."

This has not been my experience. Good fix-the-language libraries get used by everybody. Eventually they can become part of the stdlib for that language.

"It’s going to be neither small nor stable, and it will show up each time trouble surfaces. Give me one example of a language-deficiency-covering utility I want to drag with me everywhere."

Every C standard header file, before it became part of the stdlib. Lisp's ITERATE library, for looping. For Javascript, half of jQuery.

#24 Yossi Kreinin on 08.13.12 at 11:21 pm

@Phil: let's say that I haven't contributed to the standard library of any language nor worked with someone who did, and that the typical "fix the language" attempt is not worthy of becoming a part of a standard. I mean, in the same way you could say that one's own programming languages "tend to be used by everybody and eventually become standard" based on the examples of C, Perl, Python and PHP.

Leave a Comment