Reading code is torture. You will always hate the code you inherit. A
software engineer is never happy with code he inherits. You never hear
"This code is nice! A joy to work-on. Look how easy to understand and
pleasant and clever it is." Have you ever once heard somebody praise
code. It only happens when you have mentors and you look-up to them.
I agree with Terry A.
@Terry – It has happened a few times to me before, where I've looked
at code inside libraries I've used and went "oh, that's so neat! I
should take note". But it does happen less with co-workers, even very
talented ones.
You're right that overengineered code is a worse problem than dumb
code (though I don't think the split really aligns with that between
scientists and software engineers). In my experience agile – not in the
sense of complex methodologies but simply limiting work in progress,
releasing often, and tying every code change to a specific user-facing
feature – is an effective antidote. But for something that's intended as
a framework for use by developers rather than a product for customers
it's less clear how to do that.
Agree with Terry here. I am one of those people that complains but
that's mostly because I favour clean, simple code. I teach programming
for free and one of my exercises I do with students who've finished
tasks is I step through their solution and gradually refactor their code
to make it simpler and easy to follow. Then they can learn about the
parts of the language that actually would save them time instead of
falling down the dark road of OOP and over-abstracting their problem. On
future tasks there's always much much less work for me.
Now the reason I've mentioned all this is because the code my students
write after a few months is often better than that written by engineers
with YEARS of experience. I've chatted extensively with some colleagues
at work about software engineering and been sorely disappointed when
I've had to work with their code.
Java, PHP and C++ have done a lot of damage to the way we think about
our craft and while the gof's Design Patterns book continues to be
hailed as the bible of enterprise software nothing will change.
@ Imm More than agile I'd say it's automated testing that has been
the greatest boon recently. Engineers are lazy and if they need 4 lines
before each test setting up dependencies they'll likely go and refactor
the code before ever committing it. Testable code is generally
(certainly not always) simple code because it creates an environment
where it's too much of a pain to test otherwise. Heck, my first interest
in Idris and fully dependent types was because I was sick of testing my
arguments did not violate constraints (eg. not null, not less than
zero).
I don't think your comparison makes any sense really.
Of course, design patterns have no place in a simple data-driven
pipeline or in your numerical recipes-inspired PDE solver.
But the same people that write this sort of simple code are also not the
ones that write the next Facebook or Google.
I am also not surprised that you're capable of debugging simple
scientific code: it's simple code written by people that wouldn't even
know how to code stuff that has you make a double take.
On the flip side, you're going to need sensible software engineering
if you want to build a system that does a little more than just step
through a couple of time and space for loops.
@Terry: I'm fine with most code, actually; my sense of aesthetics has
become rather numb over the years. Most of "someone else's" code is just
fine to me. When it's really really hard to follow is when it's not
fine, not when it's "ugly" along any of the possible dimensions.
@Georg: Google is kinda more about PageRank than "design patterns".
But what do I know, go ahead and write the next Google. (As to
Facebook... erm... I'm not sure what to say...)
The old spaghetti code vs ravioli code dichotomy.
Yeah, more or less... The trouble is, while spaghetti is all alike,
there's just so much kinds of ravioli and they invent new ones every
day.
What you criticize are bad programmers...which is fine, they are as
fun to critique as bad scientists. But as a scientist, you should know
better than to extrapolate without further evidence or study.
One of my favorite programming blogs is from someone in the field of
bioinformatics. Programming is valuable to all kinds of science, and
some programming jobs may require more substantial engineering
skill:
From his About section: http://nsaunders.wordpress.com/about-2/about/
You may be wondering about the title of this blog.
Early in my bioinformatics career, I gave a talk to my department. It
was fairly basic stuff – how to identify genes in a genome sequence,
standalone BLAST, annotation, data munging with Perl and so on. Come
question time, a member of the audience raised her hand and said:
“It strikes me that what you’re doing is rather desperate. Wouldn’t
you be better off doing some experiments?”
It was one of the few times in my life when my jaw literally dropped
and swung uselessly on its hinges. Perhaps I should have realised there
and then that I was in the wrong department and made a run for it.
Instead, I persisted for years, surrounded by people who either couldn’t
or wouldn’t “get it”.
Ultimately though, her breathtakingly-stupid question did make a
great blog title.
While diving into the scientific code a brave programmers can use
«software archeology» approach to find a key points to understand how it
works and how we can rewrite to maintain it in the future.
TL;DR: you can write FORTRAN in any language.
TL;DR: Science pays poorly, and then discredits the entire field when
it attracts mediocre talent.
Scientists, when they realize their projects need a full-time,
non-researcher software position, tend to be cheap. When setting
salaries, these labs/centers make two mistakes:
Mistake 1. They assume that experienced professionals are willing to
take a substantial pay cut to "do science".
In fact, the opposite is true. You are asking someone to forego
like-minded software people and instead work with peers who largely view
software as plebian, as inferior to their science, and who therefore
see/treat you "support staff" rather than as a "peer". So whereas labs
assume the scientific setting is a reason to take a pay cut, it's
actually a reason to demand higher pay.
Mistake 2. They assume they can offer an introductory rate and/or
otherwise don't have to out-pay the industry rate for mid-career
engineers.
Labs/centers who do not pay very competitive salaries typically
recruit low-quality engineers: either fundamentally bad engineers, or
relatively young engineers who could become competent, but only with
mentorship (typically not available).
I thought the code was less the problem than the methodology though
in science. The failure to track the code (where ever on the spectrum of
coding sins it sits) in version control. So the hacks done to it since
the run three days ago make the runs using it back then
unrepeatable.
Code that's not intended to have a long life or is single purpose can
be written fully acceptable in the Scientist way. I can live with that,
but use version control – to me step wise changes that can be verified
is the scientific approach to software.
The advantage on the scientist side is that they understand what the
program is supposed to do. While on the software engineering they often
have no clue what their programming effort is supposed to achieve.
The "professional" programmers too often don't even try to find out.
So having no idea what they are supposed to do they retreat into what
they know, making APIs and frameworks. But no longer doing what they
should and that is solving problems using computers.
That is why I also prefer software from a "bad" scientist over "best"
practices engineers. Of course I really want code written by great
scientists and engineers applying good sense to their code and who
validate their ideas with experimental evidence.
Your tone is mocking, dismissive and not diplomatic at all. The
author of the post you're referring to at least implies that a mutual
understanding and a golden middle ground is possible. He tries to be
constructive. You try very hard to be dismissive and thus bring nothing
to a discussion — only a puberty-like raging and very rough
generalizations.
Also, you managed to completely miss the point. Granted, the horrible
image you portray the developers with does exist — and I've met quite a
few such rock-stars in my career — but you use this PART of the
developer community to justify your points.
"Oh, and one really mean observation that I'm afraid is too true to
be omitted:"... let me finish that for you: generalizations are a plague
to any discussion.
There are good practices in the software even if you're not a
software engineer — a title you use rather contemptuously — and the
original author was diplomatic enough to mention them.
A few exaples will suffice:
(1) You don't need to be 60-year old seasoned programmer who've seen
it all to at least make your code report errors in a dedicated error log
file, as opposed to [in the Java case] to System.err, no? That way, you
can always check what went wrong with your program and tune it
accordingly, even days later — imagine you got called to a meeting while
you were testing, then the power went off, and boom, your console with
logged errors is not there anymore; a problem that a persistent log (a
single file) might immediately solve. This doesn't require PhD in
computer science and it takes only a few minutes extra to code. I fail
to see the problem here, besides the scientist in question being
absolute novice in any programming.
(2) You don't need to be a hardcore engineer to have a few bare-bones
shell scripts to test parts of the code you are relying on, right? No
need to fancy-shmancy unit-testing frameworks, mocks and all sorts of
boilerplate (and they are, granted, NOT necessary for many projects). I
was helping a mathematician once who didn't want to use a 10-year old
and mature C library for matrix computations, wrote his own 10+
monstrously wrong (and long) methods and then spent one full month
trying to figure out why his genious top-level algorithm (which depended
on these matrix computational methods) isn't working. It was only when I
was asked to help him, that he actually realized (read: when I forced
the point upon him after I triple-checked) that the code he was relying
on, was wrong. Something that a very small home-grown test suite
would've helped him with since day one. Does this require a genius? Hell
no. It requires a questioning nature and that your nose doesn't point
the ceiling — something I expect from scientists innately, but boy was I
wrong for this one.
(3) If you don't know the serialization formats which are deemed as a
must-have these days, StackOverflow is not all trolls; a well-formulated
question usually yields good answers. So no, I am not willing to skip
laughing at a scientist who needs to serialize 580,000 records with 10+
attributes each to XML, instead of JSON or CSV (which, by the way, are
also cheaper to parse in terms of CPU usage) or hell, even better
sometimes — sqlite3 / cdb (so he could then import/export from/to every
known database in the world if the need arises).
I believe the wise programmers will always agree that scientists
don't need to be expert software engineers; of course not! Many of them
must operate on the edge of the unknown, and OF COURSE this means they
can't write the most beautiful code — that's perfectly okay, read my
lips, a developer says: "THIS IS PERFECTLY OKAY". There are people out
there who agree with this (check Reddit comments of the original
author's article there, if you don't believe me). That however, doesn't
excuse totally rookie mistakes like the ones pointed by the original
author, me, and others.
Hope that clears it up for you, and I hope you now understand how
horrible your tone is, because I did my best to mildly replicate it.
Peace.
Agree with Dimitar above. The moment I smell "Hey in my experience ",
I quickly know the discussion will go into a dog-eat-tail-eat-dog
whirlpool. The comment "Many programmers have no real substance in their
work" is offensive and implies that if your work does not involve any
semblance of computational or scientific work, then you are just faking
it as a software engineer. Really? So the Research departments of Major
corporations staffed with PhDs writing scientific research code who make
up 10% of less of the workforce are the only one doing substantive work
and rest of the folks are just taking free money home? My mind boggles
at how the tone of the article sweeps the rest of the discipline of
software engineering under a carpet. Just great reading. Thanks.
Erm... "Biology, bioinformatics, astronomy, physics, chemistry,
medicine, etc – almost every scientists has to write code. And they
aren’t good at it" says the article I reply to. This is supposedly not
dismissive and not a generalization.
The article I'm replying to takes it for granted that "software
engineers" on average write better code than "scientists". I think
dialog, mutual understanding etc. must include being open to the option
that this is just plain wrong. Perhaps scientific code might benefit
from better logging or more pervasive use of source control, etc. But –
"stringly typed" – is it necessarily that bad? etc. etc. – perhaps those
scientists are not as bad at it after all.
I agree with Dimitar P.
It sounds like there are some best practices (for business) which
aren't actually the best for the needs of scientific computing, and that
there are people who will push them into places they aren't
appropriate.
What you would hope to have is people who are aware of the best
practices for business, and smart enough and flexible enough to figure
out the best practices for science — and even more so, your particular
branch of science — rather than just unthinkingly throwing the
proverbial book at the problem. (Which also works poorly in actual
businesses, for that mater.)
Also, you'd want someone who can explain and demonstrate the benefits
of the practices to a rightly-skeptical constituency, and who can figure
out how to apply them most effectively at minimum cost.
Good luck with that. It's hard enough to find that kind of person for
business, where it's a smaller leap. :)
#1: I think you have to remember something here: nothing is perfect.
Most software guys will complain about the codebase they own because it
has flaws that are baked in, potentially design flaws that would be huge
effort to fix. The complaining becomes unreasonable at a point, I agree.
But if you're a programmer and can't find a way to compliment or at
least constructively criticize a peer's code, you aren't very
professional. A professional understands "not made here" is a flawed
mindset.
I think the problem might be process. Everything I check-in gets code
reviewed and vetted before it gets merged. I've cowboyed physics
simulators together in unfamiliar languages and I've written enterprise
front-end eCommerce code. The two are not incompatible. My buddy in
Bioinformatics complains that very little is modularized (well) and
everyone ends up inventing their own solution over and over. If there
are areas of scientific computing proven to work, why not also commit
some time/money to making it re-usable (and testable)?
"I've been working... in an environment dominated by people... [with]
sparse knowledge of "software engineering"... Can scientific code
benefit from better "software engineering"? Perhaps, but I wouldn't
trust software engineers to deliver those benefits!"
Sounds a lot like
"I've been working in an environment dominated by people with sparse
knowledge of "anatomy"... Can medicine benefit from better knowledge of
"anatomy"? Perhaps, but I wouldn't trust a doctor to deliver those
benefits!"
People with sparse knowledge of anything generally only know enough
to be dangerous. You don't work with software engineers. You work with
hobbyists and tinkerers. They will, on average, produce bad code. Kind
of amazing that a scientist wouldn't account for this skewed sample
set.
Software Engineers and Scientific Researchers have different goals.
If SR code breaks/seg faults/slow only a handful of people are affected.
In the world of SWE's, it could affect hundreds to thousands of fee
paying users. SWE's have to consider performance, fault tolerance
maintainability, extensability not to mention automation because they
are providing services. The writer of the article should think on that
before generalizing SWE's as 'idle'
http://yosefk.com/c++fqa/index.html
Lots of open minded snark here. Bravo.
I agree with yosefk. Software industry is terribly solipsistic and
has lost sense of its purpose. It lends very little value to the
businesses or scientific institutions it purports to serve and far too
often only bring in too much needless complexity and yak shaving. The
attack on the the article author with inanities such as "who therefore
see/treat you as "support staff" rather than as a "peer"" — You are
support staff. Deal with it. The anxiety generated by the article
author's insightful comment only tells you the obvious; deep down
software industry professionals know that much of their work is
unjustified and unjustifiable and that were things to take their
reasonable, sensible course, there'd much much culling of waste in this
industry.
I am reminded of Charles Dickens quote
"The one great principle of the English law is to make business for
itself. There is no other principle distinctly, certainly, and
consistently maintained through all its narrow turnings. Viewed by this
light it becomes a coherent scheme and not the monstrous maze the laity
are apt to think it. Let them but once clearly perceive that its grand
principle is to make business for itself at their expense, and surely
they will cease to grumble".
The Software industry/professional are principally just the same
nowadays.
can I just point out that the list of non-programmers' sins includes
things like bugs and crashes, while the software engineers' list boils
down to, "it's hard to understand all the abstraction the first time I
look at it"?
i will say the worst code often comes from the self-proclaimed
"ninjas". Part of the developer's path to enlightenment includes
overcoming over engineering and learning to write simpler code.
i will also say that i've had to deal with more bugs and more
undisciplined thinking and wasted more time in those 1000-line functions
than in the ridiculous abstractions.
anyway... as with most things, the best approach lies somewhere
between the two extremes
I work as a software engineer at a biotech company. My job is to
integrate code that scientists write into a larger infrastructure.
One of the pain points we have is that many PhD scientists are often
unwilling to learn software technologies, feeling that doing so would
distract from 'the science'. While this is valid, some problems we've
faced as an organization are:
* Lack of interest/understanding WRT using version control.
* Unwillingness to learn I/O concepts beyond CSV files.
* Extremely premature optimization, which turns out to be worse than
naive implementations.
* Unit tests that run for 30 minutes. Unit tests that test the wrong
thing. No tests at all.
* Data inlined into source code. As ASCII art.
* No understanding of Big-Oh concepts, leading to O(n^2) algorithms that
run in O(n^5).
* Programs that have dozens of options, none of which should ever be
adjusted from their defaults.
And on and on.
None of our programs are islands of functionality. They must all work
together within a larger context. We have a modest cluster in which we
expect our tools to run on perfectly during production. Quality is
absolutely critical.
Over the years, our software team has progressively improved the
situation, and many of our scientists have turned into quite good
software developers. They can work with the larger picture in mind. By
adopting software engineering disciplines, they have seen the quality of
their work improve and their productivity improve. This creates a
positive feedback loop. Improvement is slow but continual.
Software engineering isn't about creating umpteen layers of
abstraction or complicated inheritance hierarchies. It is about creating
robust softwares that function correctly together. It is about creating
an environment that people can work within effectively. It is about
creating foundations that useful things can rest upon.
I just don't get the problem. Most scientific code is open
source/public domain so everyone is free to contribute and to enhance
it. If Bozho thinks he needs to push it to the next level, he's free to
do so ;)
This article seems to arrive at its conclusions based on two things.
Ignorance and arrogance.
The ignorance comes from a lack of understanding regarding what
software engineering actually is, and why it exists in the first
place.
Software engineering is to programming as architecture is to
construction.
Sure, you could hire 10 construction workers to build your building,
and they're perfectly capable of doing so. Unfortunately, it probably
won't look anything like what you wanted, and you'll run into issues
further down the line that were never considered when construction was
taking place.
Similarly, you can hire 10 programmers to code a project, and they're
perfectly capable of doing so. Unfortunately, it probably won't turn out
exactly the way you wanted, and you'll run into issues further down the
line when you decide you want to make a few minor alterations.
A scientist may don the programmer hat, and can come up with
something that "works" well enough for his/her particular experiment.
Realistically, what they've done is waste everyone's time. Without
putting the time into the architecture component of programming
("software engineering"), they've written a single-use (and probably
sub-par) program that will have to be re-written over and over again for
each minor alteration. Also, some poor research assistant will have to
pour over line after line of terribly-written code to figure out how to
make these alterations. Oh, and then they find a bug that was actually a
problem since day 1. Now they have to patch the 35 extremely similar
(and equally poorly-written) programs.
The arrogance is typical of those in the hard sciences, and is
actually somewhat laughable. Scientists will always look at engineers as
an inferior creature, just as mathematicians will always look down upon
scientists as inferior creatures.
Unfortunately for mathematicians and scientists, without application,
theory doesn't pay any bills. We do, after all, live in the real
world.
I'd love to see a scientist write a driver to interface with their
own instruments.
Software is a tool, and a software engineer's job is to make that
tool perform its function, and leave it adaptable enough to be built
upon should the need arrive later.
Just as there are poor scientists, there are also poor engineers.
Both are, after all, human.
Guys.
I'm not an "arrogant hard scientist", I'm a programmer. I probably
know all or much of the shit you call "software engineering" so not that
ignorant, either.
As to code in biology: haven't worked there, heard a lot of horror
stories, not the thing I'm talking about, not the thing the article I
mentioned talked about. It talked about open source scientific code. I'm
talking about scientific or similar code written in an organization
mostly producing code. Either way it's never as bad as "no VCS used".
What happens when biologists write code for their own needs is not what
the discussion is about.
I've got a MSc in computer science and am about to finish a PhD in
engineering (marine robotics). What I see when I compare my code with
that of my fellow PhD candidates, who only have an engineering
background (poor, poor souls): my code is by far more generic, better
documented, better structured, much, much faster than what my colleagues
write, and, most important, doesn't take me longer to program than their
code. Plus, I get tons of reusable functions out of it they don't have
readily available, and instead of spending days on reinventing the wheel
each time I need something more sophisticated than a single for-loop, I
can concentrate on actually getting my research done.
I do commit most of what is listed as "sins" above (e.g.
subdirectories), and I force resistant environments (e.g. Matlab) to do
things an engineer would never demand (again, e.g. subdirectories).
However, these are IMHO not sins, but basic rules of adhering to
standards that make your software more versatile.
Yet, after trying to bridge the gap for so many years, I'm ready to
declare defeat. Colleagues admit that my stuff is great and then go back
to their junkyard of unsorted code fragments that needs to be kicked
into the bin each time some intrinsic detail changes. In reference to
the original post: simple-minded? Yes. Care-free? Yes.
Near-incompetence? Yes. But better? Well, maybe in the sense of a
smaller number of characters per file, but not to any other criterion,
no siree!
Yossi,
What I read from your 2 comments is — "I've been hurt by Bozho's
article, particularly when he said X or Y". Well, that's perfectly human
and normal. But the way you choose to react to it gives away your real
character.
I am not gonna act like your personal guru here. But I believe that
instead of giving up on a compromise by being a flamer in your post, you
probably should've tried to bring the two worlds together — because
these two worlds try to intermingle every day. Let's make it a
mutual-respecting friendship, not a war.
Many people, not scientists and not software engineers, have said in
the past that a big part of the scientists don't try to evolve their
discipline (math, physics, etc.) as much as they strive to get
publicized at all costs (Michael Crichton was one of the people
observing that effect). This is *not* constructive. And it is certainly
valid for many so-called programmers as well.
The way evolution works on intellectual level is: push forward
towards a common goal, against all of your animal instincts telling you
to rip your "opponent" apart — who, when you look at it beyond emotions,
is not an opponent at all.
It was the people who believed in Tesla and Einstein who discovered
that their work is significant in everyone's everyday life, not all the
cynical pricks calling them "lab rats", "white coats" and other
derogatory names similar to those you used on several occasions.
Be constructive. Both sides have very good points. Okay?
Exhibit A for the case that scientists and engineers are terrible
programmers are the "Numerical Recipes in C/Fortran/Pascal" books. Those
books have stood out in my mind for nearly 20 years because of their
terrible code.
@AHaeusler: I didn't say your code wasn't better than the code of
those other guys... I said what my experience was around people who're
probably more competent programmers than those guys but still far from
"software engineers" in terms of programming sophistication.
@Dimitar: erm... why would I be "hurt" exactly? I just said my
experience lead me to a different viewpoint. It is you who're talking
about "war", my character etc. etc. "I'm not going to act as your
personal guru" but... I think that, regardless of being wholly
inadequate, your reaction betrays a need for a thicker skin – a very
useful thing on the Internet.
@Dean: Numerical Recipes would be my Exhibit A to support my point,
actually! A great book with great code accomplishing great things.
Compare that with Design Patterns – a book full of code doing nothing
that you nonetheless are supposed to have read and memorized.
i work with math models for flight simularors. these models are
hundreds of thousands of lines long and it is a nightmare to maintain bc
of bad programming. this author talks about spaghetti code being easy to
untangle in comparison to code written by software engineers with too
much time on their hands. Wrong. The code he works with must be very
small and relatively simple in comparison. Try doing anything to 100,000
lines of FORTRASH that exclusively uses common blocks, implicit variable
declarations, and gotos. Experience the joy of following a variable that
represents velocity in the inertial frame in one routine, the body axis
in another, and arbitrarily changes units. Then tell me how easy the
code is to maintain. The point is that good design was developed for a
reason. Do people misuse it? Yes,but the good developers are practical
and it sounds like the author is a lost cause.
I feel your pain. ("His" pain?)
The original article I was replying to was talking about open-source
scientific libraries. I was talking about the related phenomenon of
people working in organizations producing mostly code. Neither "bad"
scientific code is as bad as what you deal with. What you deal with
should be compared to 100000 lines of COBOL written by "software
engineers" in a bank.
You are talking Klingon to Na'Vi and won't be heard. Personally, i
just gave up.
Programming turned into religion long time ago. Programmers don't
understand thing, they believe into them. Goto is harmful. You need
generics. Globals are bad.
People repeat mantras (thousands of them) they never tested
themselves and these mantras loose their meaning and their purpose.
The pendulum can swing too far in either direction, of course. Still,
as a rule I've found the industry code I've worked on to be easier to
maintain than scientific code.
Bad code is bad code, whether or not it's written by scientists or
programmers.
I've been working with scientists for two decades, mostly physicists,
biologists, and chemists. They tend to write bad code.
It meets all the criteria outlined in the article and rarely is
robust against changes. The latter is a big problem for research code.
Small changes lead to instabilities and the need to refactor or rewrite
(which really begs the question of whether it's even valid science).
This, in turn, leads to large amounts of research dollars being wasted
by researchers writing software rather than performing science.
To address this, they often bring in "software engineers" who are
usually (a) students from the CS department or (b) contractors with a
relationship with the university. Rarely are either of these two classes
of developers good programmers. The former because they're at the
beginning of their careers and the latter because they're usually after
the 9-5 nature of university work.
HOWEVER, when a researcher can work with a real software engineer,
excellent software is possible. Rather than defending scientists (who
should be doing science) and disparaging the mediocre developers they
tend to hire, we should work towards building more opportunities to
create environments that attract good developers.
Almost all people strongly disagreeing with me based on their own
experience are talking about settings where a scientist works in his
lab, maybe hires a developer, and the code is an internal thing. The
article I replied to talked about open-source scientific code which I
believe to be a bit better than what comes out from the above-mentioned
setting. I was talking about code going into production which is also
probably better. Bad maybe but not that bad.
Also, obviously a great programmer can help a scientist who's very
bad at programming create excellent software. But equally obviously, a
scientist better trained in programming will do better, etc. I'm not
talking about hypothetical worlds created by "us" (who are "we"?)
pursuing some path towards collective improvement, only about what sort
of bad code I fear to get stuck with the most in the imperfect world of
today.
You also explicitly leave out bioinformatics, which is the particular
area of scientific programming I have experience in. But I think the
quality of most open-sourced bioinformatics applications isn't any
better than scientist-in-the-lab. In a way, this is a good thing –
bioinformaticists are happy to share their code even if it isn't perfect
– but after several years of amateur maintenance, you end up with a
boondoggle of spaghetti and have to state, like UCSF Chimera does, that
"Compiling requires building over 40 third-party packages and is not
recommended. See below for the problems you will face."
Pavel's reference to software archeology reminded me of one of my
favorite passages ever: http://akkartik.name/post/deepness
@Ben Fulton: LLVM is a project written by the strongest programmers
on the planet, easily at the top 2% if I had to pull a number
quantifying how good they are. Building llvm-gcc was for years something
LLVM docs said "was not for the faint of heart", "elite gcc hackers"
etc. If you look at LLVM's build system today, you'll see abundant use
of the industry-standard autocrap tools which use sh, m4, make,
automake, autoconf and who knows what else (what scientist on Earth
could have created THAT?), together with a custom build system, which
builds itself and then your code and whose error messages once
misconfigured can be diplomatically called "unhelpful", and that process
involves Python and a custom syntax and there's a tool called tblgen
with yet another custom syntax. LLVM builds nicely out of the box, yes,
but try adding a target and much of your hair will get pulled out rather
quickly. Or try building gdb with tui support on Windows. Generally the
perspective of building code off the net makes my heart sink.
I'm not saying that the average piece of scientific code is great in
any way, I'm only doubting that people defining themselves as
programmers do better on average than people defining themselves as
scientists.
Maybe the thing is in programming languge itself as still be as
imperfect thing to improve? Make it suite more for people and to the
concrete task, rather than machine. (btw, i like to put grated cheese in
spagetti instead of butter)
Long functions aren't that bad... certainly better than hundreds of
one liners spread across dozens of files.
Yeah... Model View Controller, KISS, YAGNI, SOLID – just for
performing simple Monte Carlo calculation...
nice post. i could really relate to it. also liked your
ihatecamelcase post.
I've had this observation as well. Good programmers want to work on
hard problems, and have the credibility to get that kind of work. Since
they're working on genuinely hard stuff, and will often be the first
maintainers of their own code (since they're building something from
scratch and responsible for making sure it launches) they are careful to
focus on the intrinsic complexity of the problem.
Second-rate engineers, on the other hand, tend to get tossed the easy
but annoying work of filling out parochial requirements that come from
the business. It's not very interesting and it doesn't do much for the
CV, and the only reward for doing it well is getting assigned more grunt
work, so they end up overengineering in the hope of getting some CV
cred. Since the code is often inscrutable, it ends up playing to their
political advantage as well. Even if people end up disliking it, it's
hard to criticize that kind of code without the risk of being called
incompetent. (To people unfamiliar with software, "It's too complex"
sounds like "I'm too stupid." Of course, writing unmaintainable code
isn't difficult at all.
I've seen my share of "professional" code and code written by my
professors and other academics. I will only say that big code physics
are not the same as little code physics. Professional programmers tend
to bring in the heavy tools too often for projects that don't really
benefit from them, but I also think an academic programmer that would
write more than a couple thousands lines of code using the naive
approach would quickly realize he needs to hire a programmer.
Professional programmers can indeed make things better, but they can
also make things much worse, and it's not easy to ensure a good outcome
through, erm, managerial oversight. I think if the physicist has a
programmer friend known for his pragmatic mindset, maybe that'd be the
best kind of outcome.
I have to say that I saw more people who were mainly
physicists/mathematicians who ended up programming very nicely by
observing the outcomes of the various approaches in practice, than
people who were mainly programmers and eventually overcame their fear of
the mathy subject matter and their desire to hide from their fear behind
over-engineered infrastructure.
This is why we need formla program verification.
*formal
I would recommend anyone who thinks scientific code is "better"
should read "The Hockey Stick Illusion" by A. W. Montford. While it is
principally about the statistics behind climate change, a good deal of
it involves the software flaws that led to incorrect conclusions that
altered the thinking of the world about the subject.
I guess Test Driven Development and really good software engineers is
the answer here ;)
Can you elaborate more on tools to fix parallelism bugs ?
Obviously there are checkers but do you really feel these checkers are
more mature/complete than other classes of tools ?!
For computational systems, which scientific code is a kind of,
something like Cilk will nuke bugs extremely reliably (my Cilk knockoff,
checkedthreads, is freely available on github and I explained how it
works on this blog.)
Hey Zong,
This is quite complex topic – firstly the question is why do you need
parallelism?
generally parallelism is kind of tricky, especially when using
low-level abstractions like threads.
Parallelism is much easier when one uses immutable data structure and
proper constructs (like implementation of sequentional sequentions
processing http://en.wikipedia.org/wiki/Communicating_sequential_processes).
Then a lot of bugs can be reproduced in deterministic way.
Or go for typical map-reduce stack (like spark)
if you have more questions – don't hesitate to drop me an email
krzysztof.kaczmarek at me.com
cheers,
I've *sometimes* managed to write code that others have *enjoyed*
working with — but it doesn't happen all the time, and it doesn't
(generally) survive extended periods of maintenance. Writing code that
is functional and effectively communicates what it does to a wide
audience is *damn* hard unless the code itself is trivial. Indeed,
persuading people that they don't want or need anything other than the
most trivial solution is probably the hardest bit. :-)
I would say that the most optimal way is KISS. Neither group has a
clue how to do it properly and I have plenty of examples looking at over
100 guys code of all kind of training in last 30 years.
The dark art of quality software architecture isn't easy to master.
I recently left a growing software company that was completely made
up of scientists and zero software engineers.
I should say "formerly growing," because the consequences of bad
software engineering had begun to catch up to them. Yes, there code
solved a difficult problem more effectively than any competing problem,
but they have proven to be completely unable to break out of their niche
because the code base is an unmaintainable nightmare.
Almost no functions are thread-safe, as they nearly all use globals
as inputs *and* outputs. Adding a new option or a new model requires
either months of refactoring work to or if you don't want to use
copy-paste (which is, of course, what the scientists preferred to
do).
Now they have unfixable bugs, built-in leaks that will take at least
a year, probably more to fix, and crippling scalability issues that
require rewriting around 50K lines of code from scratch. Planned
features drop to years behind schedules both because of poor planning
and the nightmarishly messy code base.
They will never fix these problems because everyone who can fix these
problems quits part way into the project.
Yeah, scientists can do neat things while ignoring good programming
practice, but about the time the code base his a million lines or so,
the reason "good practice" is called "good practice" catches up with
you.
I hope you get eye cancer.
Look at all the hurt humans here desperately trying to make some mud
stick upon the authors face.
One can close the page if one is so deeply offended, but the wanton
cruelty on display by such as Dimitar etc show that some points appear
to have wounded them or penetrated to some inner realm.
I never once during the entire article felt arrogance, condenscion,
or the myriad of crimes being heaped at Yossi's feet. I read an
opinion-based article from a domain expert about his/her domain of
expertise and the trials and tribulations of such etc.
Why take such umbrage at Yossi's expression?
Such strange and childish reactions.
... and Lilac, for shame! What a pointlessly mean thing to say, and
for what? Someone expressing a professional opinion at odds with your
own?
Post a comment