I want a struct linker

September 26th, 2008

Here's a problem I've seen a lot (it's probably classified as an "Antipattern" or a "Code Smell" and as such has a name in the appropriate circles, but I wouldn't know, so I'll leave it nameless).

You have some kind of data structure that you pass around a lot. Soon, the most valuable thing about the structure isn't the data it keeps, but the fact that it's available all the way through some hairy flow of control. If you want to have your data flow through all those pipes, just add it to The Data Structure. (To antipattern classification enthusiasts: I don't think we have a god object yet because we really want to pass our data through that flow and it's right to have one kind of structure for that and not, say, propagating N+1 function parameters.)

Now suppose the structure holds an open set of data. For example, a spam filter could have a data structure to which various passes add various cues they extract from the message, and other passes can access those cues. We don't want the structure to know what passes exist and what cues they extract, so that you can add a pass without changing the structure.

I don't think there's a good way to do it in a 3GL. In C or C++, you can:

On top of JVM or .NET, you have pretty much the same options, plus the option to generate the cue container structure dynamically. Each cue would define an interface and the container structure would implement all those interfaces. The debugger would display containers nicely, and the code accessing them wouldn't depend on the container class. I'd guess nobody does that though because the class generation part is likely somewhat gnarly.

In a 4GL, you can add attributes to class objects at run time. This is similar to keeping a key->pointer map in a 3GL, except the name interning is handled by the system as it should, and you don't confuse debuggers because you're using a standard feature of the object system. Which solves everything except for the speed issue, which is of course dwarfed by other 4GL speed issues.

Now, I used to think of it as one of the usual speed vs convenience trade-offs, but I no longer think it is, because a struct linker could solve it.

Suppose you could have "distributed" struct/class definitions in an offset-based language; you could write "dstruct SpamCues { ViagraCue viagra; CialisCue cialis; }" in the Medication spam filter module, and "dstruct SpamCues { FallicSymbolsCue fallic; SizeDescriptionsCue size; }" in the Penis Enlargement module. The structure is thus defined by all modules linked into the application.

When someone gets a SpamCues structure and accesses cues.viagra, the compiler generates a load-from-pointer-with-offset instruction – for example, in MIPS assembly it's spelled "lw offset(ptrreg)". However, the offset would be left for the linker to resolve, just the way it's done today for pointers in "move reg, globalobjectlabel" and "jump globalfunclabel".

This way, access to "distributed" structures would be as fast as "normal" structures. And you would preserve most optimizations related to adjacent offsets. For example, if your machine supports multiple loads, so a rectangle structure with 4 int members can be loaded to 4 registers with "ldm rectptrreg,{R0-R4}" or something, it could still be done because the compiler would know that the 4 members are adjacent; the only unknown thing would be the offset of the rectangle inside the larger struct.

One issue the linker could have on some architectures is handling very large offsets that don't fit into the instruction encoding of load-from-pointer-with-offset forms. Well, I'd live even with the dumbest solution where you always waste an instruction to increment a pointer in case the offset is too large. And then you could probably do better than that, similarly to the way "far calls" (calls to functions at addresses too far from the point of call for the offset to fit into 28 bits or whatever the branch offset encoding size is on your machine) are handled today.

The whole thing can fail in presence of dynamic loading during program run as in dlopen/LoadLibrary; if you already have objects of the structure, and your language doesn't support relocation because of using native pointers, then the dynamically loaded module won't be able to add members to a dstruct since it can't update the existing objects. Well, I can live with that limitation.

If the language generates native object files, there's the problem of maintaining compatibility with the object file format. I think this could "almost" be done, by mapping a distributed structure to a custom section .dstruct.SpamCues, and implementing members (viagra, cialis, fallic, size) as global objects in that section. Then if an equivalent of a linker script says that the base address of .dstruct.SpamCues is 0, then &viagra will resolve to the offset of the member inside the structure. The change to automatically map sections named .dstruct.* to 0 surely isn't more complicated than the handling of stuff like .gnu.linkonce, inflicted upon us by the idiocy of C++ templates and the likes of them.

And here's why I'll probably never see a struct linker:

It still amazes me how what really matters isn't what can be done, but what's already done. It's easier to change goddamn hardware than it is to change "software infrastructure" like languages, software tools, APIs, protocols and all kinds of that shit. I mean, here we have something that's possible and even easy to do, and yet completely impractical.

Guess I'll have to roll my own yet-another-distributed-reflective-registration bullshit. Oh well.

1. HeinSep 26, 2008

Protocol buffers achieve something very similar in distributed systems — after adding a field, you only need to recompile the binaries that access the field.

Using the protocol-buffer extension mechanism you don't need to declare all the fields in one place. If the components of your system are rather coarse grained, then serializing/deserializing within the binary may make sense.

2. Yossi KreininSep 26, 2008

By "within the binary", do you mean that in the spam filter example, the cue container would be passed in encoded form, deserialized by each pass and serialized back if cues are added?

Also, are you talking about a specific implementation of "protocol buffers" or a general concept? Googling for "protocol buffer" yields way too many results, the first one being, well, Google's implementation of serialization that allows the format to be changed over time; however, I didn't see anything related to distributed format definition.

3. octopilerSep 28, 2008

I'm a bit confused by two of your points. You say:

Aggregate the cue structures by value (which means you have to recompile everything once you change/add/remove a member from any of them)

And later on:
I could roll my own compiler to a language similar to a mainstream one, with a bunch of additions like this struct linker thingie.

Doesn't the latter comment mean you essentially want to only implement a front end to your language? If so probably you might have to recompile/preprocess all the code anyways?

I feel the following standardized features in a language (D I'm looking at you) would solve a lot of such complex problems:
1) Macros that can manipulate the AST at compile time
2) Support for run-time compilation/optimization (most of this might be in the standard library).

If the debugger can be made aware of these it would be great.

4. Yossi KreininSep 28, 2008

Well, I could have my front-end use linker features like custom sections via pragmas, and then I wouldn't even need to generate assembly, just C with platform-specific pragmas and a linker script.

Regarding AST manipulation – I think the D approach of compiling strings into the code with mixin(string) is about as good as you can get in a hairy pop infix language in terms of meta-programming, because working at the AST level means exposure to a gnarly object model. As to run time compilation, it means a larger system; on desktop, I definitely don't mind, on the other hand, I can and do generate C at run time on the desktop today – save to a file, run gcc, then dlopen. And on systems where I can't install gcc I also lack the space to carry the hairy compiler for a pop infix language.

In short, I think that in order to be as strong at metaprogramming as Lisp or Forth, you have to be Lisp or Forth – that is, make ASTs very simple.

5. Ori BergerNov 1, 2008

Wait a second.

A struct linker is essentially a recompiler if you don't want performance to suffer — e.g., there's different opcodes for accessing a register + 8 bit offset and +32 bit offset. Once enough fields are added, codes and relative jumps will need to be rearranged to the point of killing optimization.

What's the problem with compiling when you make a change again? Being fond of plain C as you are (and I am in that camp too), you could just use a quick C compiler such as Fabrice Bellard's tcc and be done with it. 10 secs for cold booting a complete linux kernel from _source_ is good enough for me.

6. Yossi KreininNov 1, 2008

Regarding "recompiler" – as I said, I don't mind the sliiiiiight performance hit from having the compiler always generate the wasteful kind of load/store just in case the offset turns out huge; then the linker can always do its thing no matter what the offset turned out to be – unless there exists a machine so insane that you can't implement load-from-small-offset using the encoding you normally use for load-from-large-offset. Now that would be quite amazing.

Another option is to do it the way automatic "far call patching" is done by some linkers – when you branch too far for the offset to fit into the 26 or whatever number of bits the branch command encoding gives you, they generate code branching to a trampoline. So here if the linker saw huge offsets, you could branch to code doing the load/store and then branch back. Now, I don't know how they manage linkage with this feature because the trampoline code must be close to the function calling it, and that moves other functions which in turn may cause the generation of trampolines, etc.; looks like one of those problems where linkage ought to "converge" in multiple passes. I dunno if they limit the number of passes or overallocate space or..., just that it works (say, in Green Hills C++ on MIPS). But as I said, I can live with "always assume large offsets", so this mess is largely irrelevant.

Regarding motivation: yeah, I like C, I just get to work with C++. Sucks, doesn't it? Now, the other thing that could motivate you in saner compiled languages is that you might work with compiled code not available in source form, or be compatible with an existing build, and add members to structs they use; there are good use cases for that when the compiled code is basically this pipe you want your data to be going through. Say, you're a spam filter plugin and the compiled code is that spam filter core engine. Or something.

7. Alaric Snell-PymSep 14, 2011

For the case of passing state through complex call chains extensibly, there's a nice mechanism in Scheme called "parameters" – see http://srfi.schemers.org/srfi-39/srfi-39.html – that's what us smug Lisp weenies use!



Post a comment