Redundancy vs dependencies: which is worse?
I believe that there are just two intrinsic forces in programming:
- You want to minimize redundancy and, ideally, define every piece of knowledge once.
- You want to minimize dependencies – A should depend on B only if it absolutely must.
I think that all other considerations are of the extrinsic real-world kind – domain modeling, usability, schedules,
platforms, etc. I also think that I can show how any "good" programming practice is mainly aimed at minimizing redundancy,
dependencies, or both. I even think that you can tell a "good" programmer from a "bad" one by their attitude towards redundancy
and dependencies. The good ones hate them, the bad ones don't care.
If this idea looks idiotically oversimplified, note that I mean "programming aptitude" in a narrow sense of code quality.
I've seen brilliant, cooperative people with uncanny algorithmic capabilities who still wrote awful code. I tried to figure it
out and the common denominator seemed to be that they just didn't care about redundancy or dependencies, or even kinda liked
them. Maybe it still looks idiotically oversimplified. Let's leave it at that, because it's not what I'm here to talk about.
I'm here to talk about the case when minimizing redundancy conflicts with minimizing dependencies. This case is basically
code reuse beyond module boundaries. You can choose between having modules A and B using a module C doing something, or have
them do it themselves. What's your call?
One strikingly dumb thing about this question is that it's centered around the term "module", which is vague and informal.
However, "module" is what makes this a trade-off. Inside a module, of course you want to reuse the code, end of discussion. Why
would anyone want to parse two command line options with two duplicated code snippets when you could use a function?
On the other hand, if two modules parse command lines, we can still factor out the parsing code, but we'd have to make it
a third module. Alternatively, we can stuff it into a "utilities" module. The one affectionately called "the trash
can". The one which won't link without a bunch of external libraries used to implement some of its handy functions. The one with
the configuration which always gets misconfigured, and the initialization which never happens at the right time. You know, the
utilities module.
I believe that years of experience barely matter in terms of knowledge. You don't learn at work at a pace anywhere near that
of a full-time student. Experience mainly does two things: it builds character, and it destroys character. Case in point: young,
passionate programmers are usually very happy to make the third module, nor do they cringe when they delve into the utility
trash can. They're then understandably offended when their more seasoned colleagues, having noticed their latest
"infrastructural" activity, reach out for the barf bags. This certainly illustrates either the character-building or the
character-destroying power of experience, I just don't know which one.
No, seriously. Take command line parsing. You want common syntax for options, right? And you want some of them to accept
values, right? And those values can be strings, and booleans, and integers, right? And integers can be decimal or hexadecimal,
right? And they can be values of user-defined types, right? And they can have help strings, right? And you'd like to generate
help screens from them, right? And GUIs with property pages? And read them from configuration files? And check the legality of
flags or sets of flags, right?
Sure. It's not a big deal. Trivial, even. (If you're smart, everything is trivial until you fail completely due to
exceeding complexity. And admit that you failed due to exceeding complexity. The former takes time to happen, the
latter can never happen.) Quite some people have devoted several of the beautiful months of their youth to the problem of
argument passing. Example: XParam, which calls itself "The Solution to Parameter
Handling". Took >10K LOC the last time I checked. Comes with its own serialization framework. Rumors tell that its original
host project uses <5% of its features.
Clarification: I'm not mocking the authors of XParam. Reason 1: Rumors tell they are pretty sharp. Reason 2: I'm really,
really ashamed to admit this, but I once worked on a logging library called XLog. Took >10K LOC the last time I counted. Came
with its own serialization framework. First-hand evidence tells that its host project uses 0% of its features. Ouch.
You know how I parse command line arguments in my modules these days? Like so:
for(i=0; i<argc; ++i) {
if(strcmp(argv[i],"-trace")==0) {
trace=1;
}
}
I used C for the example because it's the ugliest language for string processing, and it's still a piece of cake. No, I don't
get help screens. No, I don't get proper command line validation. So sue me. They're debugging options. It's good enough. Beats
having everything depend on a 10K LOC command line parsing package. Not to mention a 50K LOC utility trash can full of toxic
waste.
Modules are important. Module boundaries are important. A module is a piece of software that has:
- A reasonably compact, stable interface. An unfortunate side effect of OO training is that compact
interfaces aren't valued. It's considered OK to expose an object model with tens of classes and hairy data structures and poorly
thought-out extensibility hooks. Furthermore, an unfortunate side effect of C++ classes is that "stable interface" is an
oxymoron. But no matter; nothing prevents you from implementing a compact, stable interface.
- Documentation. The semantics of the compact and stable interface are described somewhere. A really good
module comes with example code. "Internal" modules lacking reasonably complete documentation suck, although aren't always
avoidable.
- Tests. I don't think unit-testing each class or function makes sense. What has to be tested is the
"official" module interfaces, because if they work, the whole module can be considered working, and otherwise, it can't.
- Reasonable size. A module should generally be between 1K and 30K LOC (the numbers are given in C++ KLOC
units; for 4GLs, divide them by 4). Larger modules are a pile of mud. A system composed of a zillion tiny modules is itself a
pile of mud.
- Owner. The way to change a module is to convince the owner to make a change. The way to fix a bug is to
report it to the owner. That way, you can count on the interface to be backed up by a consistent mental model making it work.
The number of people capable of simultaneously maintaining this kind of mental model was experimentally determined to be 1.
- Life cycle. Except for emergency bug fixes, changes to a module are batched into versions, which aren't
released too frequently. Otherwise you can't have a tested, stable interface.
Pretty heavy.
Do I want to introduce a module to handle command line parsing? Do I really wish to become the honored owner of this
bleeding-edge technology? Not now, thank you. Of course, it's the burnout speaking; it would be fun and it would be trivial,
really. Luckily, not everyone around is lazy and grumpy like me. See? That guy over there already created a command line parsing
module. And that other person here made one, too. Now, do I want my code to depend on their stuff?
Tough question. I've already compromised my reputation by refusing to properly deal with command line parsing. If my
next move is refusing to use an Existing Solution, I will have proved the antisocial nature of my personality. Where's my team
spirit? And still, I have my doubts. Is this really a module?
First and foremost, and this is the most annoying of all questions – who owns this piece of work? Sure, you thought it was
trivial and you hacked it up. But are you willing to support it, or do you have someone in mind you'd like to transfer the
ownership to? You probably don't even realize that it has to be supported, since it's so trivial. Oh, but you will. You will
realize that it has to be supported.
I'm enough of a Kassandra to tell exactly when you'll realize it. It will be when the first completely idiotic
change is made to your code by someone else. Not unlikely, tying it to another piece of "infrastructure", leading to a Gordian
knot of dependencies. Ever noticed how infrastructure lovers always believe their module comes first in the dependency
food chain and how it ultimately causes cyclic dependencies? So anyway, then, you'll understand. Of course, it will be
too late as far as I'm concerned. My code got tied to yours, theirs, and everybody else's sticky infrastructure. Oopsie.
(The naive reader may ask, what can a command line parser possibly depend on? Oh, plenty of stuff. A serialization
package. A parsing package. A colored terminal I/O package, for help screens. C++: a platform-specific package for accessing
argc, argv before main() is called. C++: a singleton initialization management package. WTF is that, you ask? Get a barf bag and
check out what Modern C++ Design has to say on the subject).
So no, I don't want to depend on anything without an owner. I see how this reasoning can be infuriating. Shifting the focus
from software to wetware is a dirty trick loved by technically impotent pseudo-business-oriented middle-management loser types.
Here's my attempt at distinguishing myself from their ilk: not only do I want to depend on stuff with an owner, but I require a
happy owner at that. Contrary to a common managerial assumption (one of those which rarely hold but do keep managers
sane), I don't believe in forcibly assigning ownership. If the owner doesn't like the module, expect some pretty lousy gardening
job.
What about life cycle? My modules and your modules are released at different times. I might run into a need to check for
compatibility with different versions of your stuff. My tests won't compile without your code, so now I need to always have it
in my environment. What does your code depend on – is it a small, stable, defined set of things? I don't want to be
stuck just because your new version drags in more dependencies. What if I need a feature? You don't seem to support floating
point numbers. And I don't see abbreviations, either; I need them for a bunch of flags 'cause they're passed interactively all
the time.
What about stable interface and semantics? Your previous version would accept command lines passing the same option multiple
times, and take the last value. I have test scripts that count on that. Your new version reports an error, because now this
syntax is reserved for lists (-frob=a -frob=b -frob=c passes the list a,b,c as the value of the frob option). Sigh. I guess it
could be worse – you could make the string "a,b,c" from this command line, and then the problem would propagate deeper.
I could go on and on, about how your interface isn't a decent public interface (you don't seriously expect me to subclass
EnumParser to handle flags with a fixed set of legal string values, do you?). Or about the size of your code, which at the
moment dwarfs the size of my own module, so I'd have more command line parsing code than anything else in my tests. And how it
hurts when you download the tests using a slow connection to the target machine. And how I don't like it when my tests crash
inside your code, even when it's my fault, because you have hairy data structures that I don't like to inspect.
But you already got it – I don't want your code, because I'm an antisocial asshole that has no team spirit whatsoever. I'm
going to parse arguments using 5 lines of C code. Worse, I'll make a function out of those five lines, prefix its name with the
module name, and replicate it in all my modules. Yep, I'm the copy-paste programmer and you're the enlightened
developer of the next generation command line parsing platform. Have it your way, and I'll have it my way.
To complete the bad impression I've been making here, I'll use a flawed real-world analogy. Think living organisms. They have
a very serious Tower of Babel problem; redundancy is everywhere. I've heard that humans and octopuses have very similar eye
structure, despite being descendant from a blind ancestor. Command line parsers?! Entire eyes, not the simplest piece
of hardware with quite some neural complexity at the back-end, get developed independently. Redundancy can hardly get any worse.
But it works, apparently better than coordinating everybody's efforts to evolve would work.
Redundancy sucks. Redundancy always means duplicated efforts, and sometimes interoperability problems. But dependencies are
worse. The only reasonable thing to depend on is a full-fledged, real module, not an amorphous bunch of code. You can usually
look at a problem and guess quite well if its solution has good chances to become a real module, with a real owner, with a
stable interface making all its users happy enough. If these chances are low, pick redundancy. And estimate those chances
conservatively, too. Redundancy is bad, but dependencies can actually paralyze you. I say – kill dependencies first.
One thing you didn't touch on, even with redundancy in your code it
does future proof your code. Where by dependence on a central library
file being COM or DLL can break and disable your application long since
development work has finished.
Similar attitude I've read about from the Microsoft Excel camp was an
old saying "Find the dependencies and eliminate them". Though, since
then the Microsoft Excel team has since gone through so many developers
I don't think that comment really reflects the new Microsoft Excel
team.
Would like to add, that I agree with you that it comes down to
experience. Choosing and weighing up the dependencies and redundancy at
the start of the project can in most cases mean the difference between
finishing the project on time or overshoot it. Also your perspective and
motivations.
Fantastic post. I wish more developers had this kind of wisdom!
[...] here's Yossi, writing about dependencies and redundancy: But
you already got it – I don’t want your code, because I’m an antisocial
asshole that has no [...]
Another way to look at it (at least, I think it's similar): I'm old
enough to remember when the big thing wasn't "don't repeat yourself", it
was "coupling and cohesion". Minimum coupling and maximum cohesion are
the desired traits. Dependencies increase coupling and reduce cohesion.
Bad on both counts.
Copy-and-paste still sounds evil to me, but I have to say that in my
(considerable) experience it has actually caused me considerably less
grief than dependencies have.
Don Knuth seems to think so too: "To me, 're-editable code' is much,
much better than an untouchable black box or toolkit... you’ll never
convince me that reusable code isn’t mostly a menace."
Domain specific languages are good for you because a) They generally
have small and consistent API, b) They generally require enough effort
to make them owned and have real releases.
Also, you've usually got to have a few functions which back up
deficiences of your language, which will reside in utilities, used
universally and can be generally considered part of underlying
platform/language.
They would also have a small stable API and would not generally have a
release cycle, at all.
Regarding the Excel team as portrayed by Spolsky and Knuth: it's the
"do it yourself" way; this post was more about defending "don't do it".
Both have to do with minimizing dependency, but it's a different battle.
"Do it yourself" is loved by strong programmers; "don't do it" is
something managers and bad/evil programmers are attracted to. I can sure
relate to Excel's own C compiler and Knuth's own processor though, and
DIY can be hard to defend in, um, an industrial environment; I think I
should talk about it some time.
Regarding future-proof code: to be fair, redundancy can make change
hard, and future may force change. But still, depending on a vast array
of external libraries with different life cycles is waaaay more likely
to give you royal pain than redundancy, if we look at the kind of
dependencies and redundancy found in code written by reasonable people.
I think.
DSLs: I love them. DSLs are DIY taken to an extreme though.
Utility trash can covering up the deficiencies of your language:
BARF. Works for exactly once person and becomes a nightmare otherwise.
It's going to be neither small nor stable, and it will show up each time
trouble surfaces. Give me one example of a language-deficiency-covering
utility I want to drag with me everywhere.
(By any chance, isn't your deficiency-ridden language C++, and isn't
your trash can called debug.h? Many people have a debug.h. I'm just
curious.)
I think this is a really interesting topic. Some thoughts:
Butler Lampson said similar things in "Software Components: Only The
Giants Survive." He seems to be of the opinion that it hasn't turned out
to be cost-effective to use someone else's modules unless they are
5MLOC+: databases, programming languages, operating systems, etc. At
that point rolling your own is too hard, and there is probably an
organization and maybe a business plan behind those 5MLOC, so there are
resources to make decent general purpose interfaces and you don't have
to worry about who the owner is. Everyone uses (a different) small
fraction of each of these components and loses some number of orders of
magnitude of runtime efficiency in exchange for having to write less
code. No one can productively use more than a few modules at any level
because of inevitable impedance mismatches.
It's interesting to look at how Chuck Moore approaches this with
Forth. Redundancy is eliminated in favor of dependency as much as
possible, and dependencies are factored into the smallest possible
groups (words). Words are grouped together in small bunches (screens),
but there are no real module boundaries and the only real interface is
the very simple one between words. Ideally one ends up with very little
redundancy and dependencies that can easily be managed because 1) each
word only has a few, and 2) the overall system is much smaller (the
zillion tiny modules aren't a problem because there are really only a
few hundred or so). You can have your cake and eat it too with respect
to redundancy and dependency, but you pay for it by making it much
harder to use any kind of external library at all.
Forth is basically a DSL constructor, and the philosophy tends to the
extreme of extreme DIY. Still, think about it: at small scales I think
every language and every good programmer solves the problem the same way
as Forth and other truly extensible languages do: all code necessarily
contains dependencies (it's dependencies all the way down to the laws of
physics), we know how to manage dependency up to some level by breaking
it into small pieces, minimum redundancy is theoretically zero, and
redundancy increases code size and therefore necessarily adds
dependencies as well (although redundant dependencies are usually not as
bad). The problem occurs when 1) the code grows past the point where the
programmer can handle the dependencies, or 2) the code grows past the
point where the language can break all the dependencies into small
groups, or 3) you have to use code with an interface that someone else
has designed that has any of the problems that are mentioned above, or
by Lampson.
The dominant model for scaling seems to be to try to solve everything
with good modules. The result is that there are good large modules, but
they all have their own complex interfaces that you have to learn and
fit together. You don't get much help from the language scaling your own
code, so your code size grows fairly quickly and at 1-30KLOC you feel
pressure to tie it off with your own more or less ad-hoc interface. Pain
scales super-linearly with the number of modules you try to glue
together. It takes a lot of smart people to build large systems, and
they probably write most of their own modules to minimize interface
clash.
Two other approaches:
Niche languages with 2X better support for breaking dependencies into
small groups: They have no libraries, so the largest systems that get
built have 2*1-30KLOC. Nobody is impressed, nobody writes libraries, no
one uses the language, it dies.
Forth: Chuck Moore writes sophisticated applications with it in
ridiculously small amounts of code (very low redundancy, relatively low
number of very small packages of dependences) and without libraries (no
external software dependencies), thus (almost) completely bypassing the
interface problem. No one else wants to buy a simple unified interface
and drastically reduced code size at the cost of a huge impedance
mismatch with 99.9% of existing code and most large components.
Yossi Kreinin:
Easy!
In Java:
public static String notNull(String src) {
return (src == null ? "" : src);
}
public static boolean isEmpty(String str) {
return str == null || str.trim().length() == 0;
}
public static boolean isNotEmpty(String str) {
return !Util.isEmpty(str);
}
public static int intValue(String str, int def)
{
if(Util.isEmpty(str))
return def;
try {
return Integer.parseInt(str);
} catch(NumberFormatException nfe) {
return def;
}
}
Without them, I would DIE, my code will be much longer for all places
where I would inline these helpers redundantly, and it will be
exception-unsafe in every place where I will forget them (and yes, I
tend to "forget" some checks if they are ugly and require a few lines
per check).
@Yossi Kreinin
Wouldn't you agree that this is soooo much easier to read. Its not as
robust as your first example, but when you take a system wide view
instead of a narrow single function such redundancy is the biggest enemy
of simplicity.
public static int intValue(String str, int def)
{
try{
return Integer.parseInt(str);
} catch(NumberFormatException nfe){
return def;
}
}
to ilyak, Entity: I could start nitpicking along the lines of "why do
you treat empty and null strings as the same thing, few languages do so
and for a good reason, too, *I* never have problems with that", etc. But
that would miss the point; of course you might need this kind of
function. I drag a bunch of those with me; for example, C++ istream
doesn't have a filesize function, you need to seekg and subtract, so I
have a function for that, etc.
The question is, do you drag it with you as a package that other
people are supposed to use, or do you stuff it into a utility class/file
you replicate in many packages/modules you work on. All I was saying is
that the latter beats the former. Because if you make a "shared module"
out of these few funny handy functions, you will have created a
dependency problem. People will "enhance" it, add functions depending on
still other utility modules, break the semantics of your functions, etc.
Or if they can't do that socially, they won't reuse it, because you see,
I want to try and parse the number as decimal and hexadecimal before
failing and returning the default, but you only care about decimal, so
I'll roll my own functions which I'd die without :) And if it won't be
reused, and as you claim there's no release cycle, there seems little
point in sharing it. Let those thing be the guts of a module, not a
module in its own right, to make the overall picture of which modules
there are and how they're related a cleaner one. It's surprising how
much the cleanliness of this picture matters for the social well-being
of projects; unclear dependencies and roles of modules actually confuse
people a lot, despite the usual triviality of the issues.
to arakyd: thanks for the pointer to Lampson's paper. I particularly
liked the bit about "specs with teeth".
Chuck Moore is an extremist even in the Forth community. For example,
most forthers, um, endorse ANS Forth, and Chuck Moore more or less hates
it, because of being bloated and requiring compatibility. Here's a man
who loves impedance mismatch...
I used to like Forth a lot. I now think it's The Wrong Thing. Its
terseness is achieved at the cost of extreme obfuscation, which has the
side effect of few opportunities for automatic optimization.
The compact programs written in Forth aren't just the result of
language features. Having no variable names only gives you so much space
savings, and the metaprogramming features like CREATE/DOES> aren't
even in Chuck Moore's toolbox – apparently 'cause he's into simplicity
and this stuff is hairy. And I think you can do with Lisp what you can
do with Forth in terms of metaprogramming power, without getting cut
nearly as painfully by the sharp edges.
Compact Forth programs are the result of whole philosophy of
preferring simplicity to features. Unix also has this philosophy, but
Forth takes it to a 10x more extreme level (how did that quote by Chuck
Moore go?.. "Forth uses a different approach. No syntax. No files. No
types. No operating system."). Now, I don't think that cutting features
to make your life easier is the way to go. Definitely not when end
user-visible features are at stake. And even developer-visible features
are more important than the less competent managers would like to
believe. It's not trivial to draw the line correctly, and this is why I
thought my command line example would be appropriate – there, I find
that the question of what features to have is experimentally found
non-trivial enough to be flammable. But even Unix has too few core
features, and a Forth system has waaaay to few. Which is why
Windows>Unix>>Forth in terms of market share. Dead simple.
Take multiplexing. OSes essentially do multiplexing. You get the
illusion of many files on top of one disk, many processes on top of one
CPU and memory array, many sockets on top of one Ethernet cable, many
windows on top of one screen, etc. Forth refuses to do that. How can a
sane person argue for not doing that, except for saving labor for
oneself? Of course many machines for the same money are better than one.
You can copy and paste data between them, for example. Damn useful. On
Unix, clipboards never quite worked – they weren't a core feature.
Problem. Forth systems take over the machine and refuse to multiplex
them. Why would I use a Forth system?
Features are a good thing.
Regarding Chuck Moore's Forth apps: I had a hardware hacker look at
the (wildly incomplete) documentation of Chuck's ColorForth hardware
design toolchain. You know, the alternative to the very expensive
Verilog/VHDL-based toolchains. Well, the guy skimmed through it, and
reported that the basic model of hardware was very simplistic and you
just couldn't do with those tools what you could do with the
industry-standard expensive ones. But no matter: Chuck uses the tools to
produce hardware with tiny gate count. Which themselves have few
features and so aren't very competitive.
I never fully believed the tails about Forth being 10x or 100x more
efficient than the rest of the industry, but it took time to fully
appreciate how hallucinated this claim really was.
Thanks for this. Clarifies something for me.
On Chuck Moore. Years ago I saw Chuck Moore give a talk in Sydney,
showing off his PCB design system. This was astonishingly minimalist. He
had three push buttons instead of a keyboard. Basically he showed a
screen of characters and you cursored left/right up/down to select.
Hard core.
He didn't believe in pull-up resistors. So to determine if a line was
floating or driven, he drove it low, read its value, drove it high and
read the value again.
However, it wasn't all insanity. He didn't believe in floating point.
I admire that in a programmer, even as I myself contribute to global
warming through heating up millions of transistors.
Has the world gone insane? I really feel that your prgonostications
are very valuable, but I suspect that these belong to a very, very
minority view. Should you choose to expand in a job interview or
application form, you'll be confronted by blank stares, if not
hostility.
Programming is all orthodoxy now. Trying to make stuff work, is like,
you know, hard?
Jamie
Not believing in floating point IS insanity. To an extent making it
worth to write about it some time. I've spat blood over this for a lot
of time, and I'm going to share my takeaways with humanity even if it
couldn't care less.
Chuck Moore kicks butt, but he is, put simply, an extremist. At its
core, extremism is rooted in refusing to accept the existence of
"unsolvable" problems and having to choose between two bad/suboptimal
options. "Problem? LET'S FUCKING BLOW IT AWAY!!" It's a whole mindset,
which I love, and quite some very talented people have it, but it's
incompatible with reality. It's OK; compatibility with reality isn't
everything. As long as both you and your worst natural enemies are
unarmed.
Interviews are a special genre, where one key thing you want to
demonstrate is compatibility. An inevitable genre, but a sucky one.
Blogging is surely better than interviewing. BTW, quite shockingly,
apparently a couple of quite senior managers liked this article (WHAT?),
while programmers tended to like it somewhat less. Aside from the fact
that I feel like having just sucked up to the authorities, this is
natural. Do-it-yourself-without-dependencies-and-overgeneralizing is
Goal-Oriented. "Code reuse" and "generic frameworks" Jeopardize
Schedules. (I'm not making fun of the managerial perspective, just of
the terms and the less competent managers.) So this particular opinion
of mine isn't that counter-mainstream, on the contrary; of course it
doesn't make it a good thing to say in an interview, because you
shouldn't be opinionated there.
If I had to make a bet on the question whether the world or myself
has gone insane, I'd bet on the second option, since it proved to be a
safe bet in the past.
[...] Redundancy vs dependencies: which is worse? (tags: architecture
design development engineering philosophy programming software)
[...]
On one hand, I'm really glad you raise this topic. The struggle
between dependencies and redundancy baffles everyone who writes large
amounts of code, at one level or another. You've summarized the problem
nicely.
However, I disagree with the conclusions you reach. Dependencies can
be managed. Yes, they can! With judicious use of a common code
repository, source control and tests dependencies can be tamed enough to
bring much more merit than harm. It may take some energy to manage them,
but surely less than implementing your own command line parser, text
lexer, hash table, math utils, ...you get the idea... for every
project.
I agree that they can be managed, and at times they should be, I'm
just saying that I'm not trusting code isn't managed already as a
full-fledged module.
In fact, more often than not I'm the maintainer of "shared
infrastructure" code, so I'm far from suggesting to "ban" that. But
experience in that made me believe Brook's numbers – something
"generally usable" costs 9x-10x more development effort than something
"locally usable". I think this number should be kept in mind when
deciding to make something reusable, that's all; it doesn't mean "don't
do it" – it can still pay off. I just hate it when people pitch their
half-baked utilities as "infrastructure". I do understand their
motivation though; in fact I wrote about that motivation in "The
internal free market".
Could you give reference to Brook’s numbers?
"
But experience in that made me believe Brook’s numbers – something
“generally usable” costs 9x-10x more development effort than something
“locally usable”.
"
Sure. It's from The Mythical Man Month (a "classic" that I generally
didn't like very much.)
According to Brooks, the 9x-10x was, if I remember correctly, the
result of the compounding of a 3x and another 3x. The first 3x was
"generalizing the program's inputs and outputs" and the other 3x was
documentation and other polishing needed for "exporting" something. But
I'm not sure.
[...] recorded first by schaapy on 2009-02-24→ Redundancy vs
dependencies: which is worse? [...]
It's 2010. There is no reason to write your own command-line parsing
library (or module, or framework, or whataver). Even a casual search
will turn up hundred of libraries, either with LGPL or BSD licenses,
that you can use in your project. libpopt is a good one. And if you
don't want to link in an extra library, there's a perfectly good
function called getopt() right in every libc.
The same is true for a lot of other problems. It's always best to
check if there is a library that does what you want, before you rush out
to do something. And don't rule out the possibility that that library
might be libc itself! For example, consider using getenv() to check for
some environment variables instead of writing
TheWorldsBillionthConfigFileParser.
I agree with you that some programs just have excessive dependencies.
It's a matter of taste, though. It's almost impossible to come up with a
hard and fast "rule" about when something is worth factoring out into a
library.
Hi, just wanted to say this is an excellent post. I find it really
hard to explain to people why writing programs that depend on tens of
libraries are (what I will now call) 'a barf bag'. Sometimes even
depending on one library can be a massive nightmare (C++ boost anyone?).
But you can't say this to people because it goes against everything they
have been taught about good software engineering practice and it makes
you sound like you don't know what you are doing, when actually its the
fact that you are enormously experienced that has led to doubts about
what beginners are taught.
Yeah, it's a good thing boost is pretty much banned where I work.
As to what's considered "good practice" – I guess a lot of people are
burnt by "independence extremists", for instance, people who clone
someone else's (massive) chunks of code for the sake of "independence"
and drag the copy with them. I can see how one can develop an allergy
for the idea of "dependency minimization".
"Utility trash can covering up the deficiencies of your language:
BARF. Works for exactly once person and becomes a nightmare
otherwise."
This has not been my experience. Good fix-the-language libraries get
used by everybody. Eventually they can become part of the stdlib for
that language.
"It’s going to be neither small nor stable, and it will show up each
time trouble surfaces. Give me one example of a
language-deficiency-covering utility I want to drag with me
everywhere."
Every C standard header file, before it became part of the stdlib.
Lisp's ITERATE library, for looping. For Javascript, half of jQuery.
@Phil: let's say that I haven't contributed to the standard library
of any language nor worked with someone who did, and that the typical
"fix the language" attempt is not worthy of becoming a part of
a standard. I mean, in the same way you could say that one's own
programming languages "tend to be used by everybody and eventually
become standard" based on the examples of C, Perl, Python and PHP.
fallacy; the problem with dependency is not the change in API, as
they should always come with a major release number, and nowadays
dependency manager let you specify the RANGE of dependency, so you can
fix it to Y.x.x and if rules are respected, problem shouldn't
exist.
Redundancy IMHO is the worst, and i can say I'm one of those who does
NOT like to use external library as i often find them overkill and/or
too complex for my goals; i just have a long series of repository, and
git submodule (still have to really try subtree) is my friend
I get it! I won't send the link to this page to friends, but rather
copy-paste the article in an email :)
If you expect your friends to malfunction once the page is
unavailable, the way code breaks due to dependency issues, then please
do send a copy, I'll certainly sleep better as the webmaster.
The existence of services like Pinboard’s “archive every page I
bookmark” subscriber tier shows that copy-pasting articles is
obviously a ludicrous, ridiculous, and totally absurd notion.
:-)
Thank you for taking a position, even if I disagree. :)
I would like for you to consider adding the module source code to your
repo, and writing a facade to isolate you from its quirks and broken
edge cases.
1. The facade is wonderful for implementing your application-specific
calling patterns, so the main code is easy to read; it's responsible for
biolerplate necessary to use the module
2. The facade is a good place to add related utility functions.
3. Including source code protects you from breaking changes. You can
update at your leisure, or never.
4. Including source code allows you to fix bugs in a timely manner.
Post a comment