Don't ask if a monorepo is good for you – ask if you're good enough for a monorepo

This is inspired by Dan Luu's post on the advantages of a single big repository over many small ones. That post is fairly old, and I confess that I'm hardly up to date on the state of tooling, both for managing multiple repos and for dealing with one big one. But I'm going to make an argument which I think mostly works regardless of the state of tooling on any given day:

  • Monorepo is great if you're really good, but absolutely terrible if you're not that good.
  • Multiple repos, on the other hand, are passable for everyone – they're never great, but they're never truly terrible, either.

If you agree with the above, the choice is up to your personal philosophy. To me, for instance, it's a no-brainer – I'll choose the passable thing which successfully withstands contact with apathetic mediocrity over the greater thing which falls apart upon such contact in a heartbeat.

You might be different – you might believe in Good – and then you'll choose a monorepo, like Google, the ultimate force for Good in technology (which is why they safeguard your personal data; you wouldn't want someone evil to have it – luckily, Google can do no evil.) And I'm almost not kidding: the superpower which lets Google maintain the grassroots bureaucracy which I find necessary to make monorepos work well is actually the same trait making you sufficiently delusional to chant, or at least to have chanted "Don't Be Evil" entirely seriously. I don't have that. I am, to a first approximation, evil. Worse is Better.

But that's me – I'm not saying you/your org are Not So Good, or Evil. I'm only saying that you should be open to the possibility, and that I don't see the implications of being Not So Good discussed as much as they deserve.

Why are monorepos terrible if you're not that good? Three reasons:

  1. Branching in
  2. Modularity out
  3. Tooling strained

Let's discuss them in some detail.

Branching: getting forked by your worst programmer

In a Good team, you don't have multiple concurrent branches from which actual product deliveries are produced, and/or where most people get to maintain these branches simultaneously for a long time. And you certainly can't have branching due to outright atrocities, like someone adding a feature by killing a feature – for example, making the app work on Android, but destroying the ability to build for iOS in the process.

But in a not-so-good team… you get the idea.

What do you do when you have a branch working on Android and another branch working on iOS and you have deliveries on both platforms? You postpone the merge, and keep the fork. For how long do you postpone the merge? For as long as is necessary for the dumbass who caused the fork to fix their handiwork, in parallel with delivering more features (which likely results in digging a deeper hole to climb out of afterwards.) And the dumbass might take months, years, or forever.

The question then becomes, what was forked?

In a multi-repo world, the repo maintained by the team with the dumbass on it got forked. In a monorepo world, the entire code base got forked, and the entire org is now held hostage by the dumbass. And you might think that this will result in a lot of pressure to fix the problem, and you'd be wrong, for the same reasons that high murder rates don't cure themselves by people putting pressure on whomever to lower them to some equilibrium level common to all human societies.

Some places have higher than average murder rates, and some places have have higher than average fork rates. And I argue that a lot of places have fork rates which combine into a complete disaster with a monorepo. And you might not even realize how bad the fork rate is at your place, because multiple repos largely shield you from the consequences. Or, more tragically, you might not realize how bad your fork rate is because your monorepo is in its first couple of years, and you're sowing what you'll reap in its next couple of years, when you'll have more code, more deliveries and more dumbasses.

With multiple repos, if you have your shit under control, and your repos have a single release branch with a single timeline, all you have to do is to test against both of the dumbass's branches. But with a monorepo, you need to maintain your code in 2 branches, with a growing share of everybody else's code morphing incompatibly in those branches, simply because they exist. And very soon it will be more than 2 because there's more than a single dumbass, and good luck to you.

Modularity: demoted from a norm to an ideal

Norms are mundane, but they are what is. Ideals are lofty, but they are merely what should be (and typically isn't.) If you want to actually have something, you don't want it to be an ideal, like altruism – you want it to be a norm, like wiping one's ass. If something is demoted from ass-wiping to altruism, that something will scarcely be found in the wild.

With multiple repos, modularity is the norm. It's not a must - you technically can have a repo depending on umpteen other repos. But your teammates expect to be able to work with their repo with a minimal set of dependencies. They don't like to have to clone lots of other repos, and to then worry about their versions (in part because the tooling support for this might be less than great).

In fact, a common multi-repo failure mode is that people expect too few dependencies and make too many repos which are too small to host a useful self-contained system. Note that this failure mode is not lethal. It kinda sucks to have this over-modularity with benefits of independence which turn out to be imaginary upon a closer look, and to have people treat what essentially are internal APIs with way too much reverence, just because two modules which are extremely tightly coupled conceptually are independent technically, in terms of cloning/building/testing. But it doesn't kill you.

With a monorepo, modularity is a mere ideal. Everybody clones the whole thing. You're not supposed to add gratuitous dependencies, but it's very easy to add such a dependency in terms of cloning, building and versioning, and nobody objects to the dependency being added the way they would if they needed to clone more repos.

Of course in a Good team, needless dependencies would be weeded out in code reviews, and a Culture would evolve over time avoiding needless dependencies. In a not-so-good team, your monorepo will grow into a single giant ball of circular dependencies. Note that adding dependencies is infinitely easier than untangling them, much like forking is easier than merging, with the difference that the gut-felt urgency to merge ("I can't maintain all your damned branches any longer!!") is far greater and far more backed by simple self-interest than the urgency to improve the dependency structure.

Tooling: is yours better than the standard?

This part might age worse than the others, and might not be particularly up to date even now – what "standard" tools are capable of changes over time. But generally speaking, a growing monorepo is likely to outgrow the standard version management tools and methods, as well as other tools and methods dealing with your revision controlled code.

Google used to have a FUSE driver to avoid copying hundreds of millions of source lines at a time, and instead getting the files on demand, when a directory is cd'd into. Facebook used to hack on hg to make it fast on its large monorepo. Maybe already today, or some day, a growing number of off-the-shelf tools will scale to infinite monorepos without such investments. But it sounds reasonable that there will always be tools and workflows which you will struggle to make work with a large monorepo (starting with some script doing find/grep.)

With a bunch of small monorepos, you work with a small overall number of source files in your working directory, so you don't need to tell your tools, "don't try to deal with the whole thing – instead only search this subset, or use this index etc. etc." And you have tools these days which kinda sorta let you manage the revisions of multiple repositories (for instance, there's Google's Repo.) And I think the result is very, very far from a great experience potentially afforded by a large monorepo. But it also never breaks down as badly as a large monorepo outgrowing the abilities of tools, as well as the ability of your local toolsmiths to find creative workarounds for these growth pains.

Summary

Don't ask if a monorepo is good for you – ask if you're good enough for a monorepo. Personally, I don't have the guts to bet on the supply of Goodness in a given org to remain sufficiently large over time to consistently avert the potential disasters of monorepos. But that's just my personal outlook; if you want to compliment me, don't call me "smart," and definitely don't call me "good" – I know my limits in these areas, and I take far more pride in knowing these limits than in the limits themselves; so, to compliment me, call me "pragmatic." Yet a culture worthy of a monorepo absolutely can exist – just make sure yours actually is one of those, and don't mistake your ideals for your norms.

7 comments ↓

#1 Ben D on 07.30.19 at 2:02 am

Wish you posted more often, always happy to read you and as always your insight is very valuable.
The last paragraph is should be made a lambda for many things, past source control system philosophy.

I wish there were better methodologies for modular repos than what I have encountered, they would encourage modularity and distributed development. Large codependent codebases cause product paralysis more surely than anything else I can think of ( touching anything is a leap of faith) except maybe awkward broken source control/continuous integration ( touching anything is a stab in the water, and you are in the water as well, and it is dark)

#2 spaceribs on 07.30.19 at 7:26 pm

I love this article, but I wanted to point out that while you talk about a culture of norms, it should be said that if your senior developers aren't effective leaders (for whatever reason) that leads to a crop of dumbasses.

Most of the time, if someone is marked as a "dumbass", it's ineffective leadership which failed to communicate proper expectations and is allowing a problem to fester.

#3 dylan on 07.30.19 at 8:07 pm

Good points. I would propose that there are projects that a monorepo does make sense for, and other projects that separate repos make sense for. An organization might use a small monorepo for 1 project, and separate repos for others.

#4 lwb on 07.30.19 at 8:55 pm

Interesting article. Facebook does well with the monorepo because they centralize and control all three aspects you talked about.

Engineers commit straight to master and submit a "diff" which is rebased on top of everyone else's diffs every time they "ship". The only way to create or ship diffs is to use centralized tooling that requires you to pass a ton of standardized tests. You're also required to get at least one code reviewer, etc.

But then again, Facebook is notoriously easy to break (interns have done/still do it). The culture is much more of biasing towards action and trying to control bad side effects if/when they might happen.

#5 No, shut up Chris Cates on 08.01.19 at 4:33 am

A balance between the two is the most important.

Dylan made the most rational argument… There are times where a separate repo is required. But in general, you should try to consolidate related projects all into one.

For the majority of my career, I have always had the luxury of being under staffed and low budget when it comes to managing and hiring developers. More repos means less issues when rebasing to the development branches, and, less headaches if someone were to break something.

This is under the assumption that people will inherently break the codebase… Which is fundamentally assuming mediocrity.

#6 Gordon on 08.02.19 at 1:05 am

I agree at scale. But also most systems I've dealt with have at most a couple dozen people working on them, we're not all google. And when you personally know everyone with push privileges it's easy enough to enforce a no forks rule. Beyond that though, yeah separate repos. It's like the micro services discussion, the main benefit of micro services is it makes scaling your team easier, but if you are small it's a needless devops expense for no gain.

#7 Frank on 08.05.19 at 12:36 am

This is so on point. There is no shortage of dumbassery in the world and the chances your organization will inevitably hire a couple are pretty high. This, coupled with how hard it is to actually fire someone these days (PIP, Document, More documents, HR blames manager, Manager blames HR, document some more or we will get sued.), will ensure you have to continually deal with dumbassery.

With that said, you are exactly right. The religious wars of mono vs. multi are even more dumbassery. Use what works for you. Just because Google and BookFace use it, doesn't mean it's right for your organization.

Great blog post!

Leave a Comment