Can your static type system handle linear algebra?

It seems a good idea to use a static type system to enforce unit correctness – to make sure that we don't add meters to inches or some such. After all, spacecraft has been lost due to such bugs costing hundreds of millions.

Almost any static type system lets us check units in sufficiently simple cases. I'm going to talk about one specific case which seems to me very hard and I'm not sure there's a type system that handles it.

Suppose we have a bunch of points – real-world measurements in meters – that should all lie on a straight line, modulo measurement noise. Our goal is to find that line – A and B in the equation:

A*x + B = y

If we had exactly 2 points, A and B would be unique and we could obtain them by solving the equation system:

A*x1 + B = y1
A*x2 + B = y2

2 equations, 2 unknowns – we're all set. With more than 2 measurements however, we have an overdetermined equation system – that is, the 3rd equation given by the measurement x3, y3 may "disagree" with the first two. So our A and B must be a compromise between all the equations.

The "disagreement" of x,y with A and B is |A*x + B – y|, since if A and B fit a given x,y pair perfectly, A*x+B is exactly y. Minimizing the sum of these distances from each y sounds like what we want.

Unfortunately, we can't get what we want because minimizing functions with absolute values in them cannot be done analytically. The analytical minimum is where the derivative is 0 but you can't take the derivative of abs.

So instead we'll minimize the sum of (A*x + B – y)^2 – not exactly what we want, but at least errors are non-negative (good), they grow when the distance grows (good), and this is what everybody else is doing in these cases (excellent). If the cheating makes us uneasy, we can try RANSAC later, or maybe sticking our head in the sand or something.

What's the minimum of our error function? Math says, let's first write our equations in matrix form:

(x1 1)*(A B) = y1
(x2 1)*(A B) = y2
(xN 1)*(A B) = yN

Let's call the matrix of all (xi 1) "X" and let's call the vector of all yi "Y". Then solving the equation system X'*X*(A B) = X'*Y gives us A, B minimizing our error function. Matlab's "" operator – as in XY – even does this automatically for overdetermined systems; that's how you "solve" them.

And now to static type systems. We start out with the following types:

  • x,y are measured in meters (m).
  • A is a number (n)- a scale factor – while B is in meters, so that A*x + B is in meters as y should be.

What about X'*X and X'*Y? X'*X looks like this:

sum(xi*xi) sum(xi) //m^2 m
sum(xi)    sum(1)  //m   n

….while X'*Y looks like this:

sum(xi*yi) sum(yi) //m^2 m

The good news is that when we solve for A and B using Cramer's rule which involves computing determinants, things work out wonderfully – A indeed comes out in meters and B comes out as a plain number. So if we code this manually – compute X'*X and X'*Y by summing the terms for all our x,y points and then hand-code the determinants computation – that could rather easily be made to work in many static type systems.

The bad news is that thinking about the type of Matlab's operator or some other generic least-squares fitting function makes my head bleed.

Imagine that large equation systems will not be solved by Cramer's rule but by iterative methods and all the intermediate results will need to have a type.

Imagine that we may be looking, not for a line, but a parabola. Then our X'*X matrix would have sum(x^4), sum(x^3) etc. in it – and while spelling out Cramer's rule works nicely again, a generic function's input and output types would have to account for this possibility.

Perhaps the sanest approach is we type every column of X and Y separately, and every element of X'*X and X'*Y separately – not meters^something but T1, T2, T3… And then we see what comes out, and whether intermediate results have sensible types. But I just can't imagine the monstrous types of products of the various elements ever eventually cancelling out as nicely as they apparently should – not in a real flesh-and-blood programming language.

Maybe I'm missing something here? I will admit that I sort of gave up on elegance in programming in general and static typing in particular rather long ago. But we have many static type systems claiming to be very powerful and concise these days. Maybe if I did some catching-up I would regain the optimism of years long gone.

Perhaps in some of these systems, there's an elegant solution to the problem of 's type – more elegant than just casting everything to "number" when it goes into and casting the result back into whatever types we declared for the result, forgoing the automatic type checking.

So – can your static type system do linear algebra while checking that measurement units are used consistently?


#1 Mark on 04.05.14 at 1:06 pm

Consider F#. It allows compile time checking of units. I assume that meets your definition of static?

#2 Yossi Kreinin on 04.05.14 at 1:10 pm

OK; most static type system allow compile time checking of units in some form or other. I'm specifically asking what's the type signature of or a similar function and then what are the types of intermediate variables, of a determinant function etc. Because I dare suspect that this case is not handled by any type system out there. Or if it is handled I'd like to see how.

#3 Pierre F. on 04.05.14 at 2:52 pm

I understand your point, but I think you're trying to cram too much into a type system. The way I'd approach this is to embed a symbolic solver in the program and use that, e.g. Maxima

#4 Yossi Kreinin on 04.05.14 at 9:31 pm

I don't think a symbolic solution for a 10×10 system is a practical approach. And I'm not trying to cram too much into a type system but rather trying to check if my impression that static typing breaks down very quickly is correct or not. The average discussion about typing talks about collections and covariance/contravariance and Employee from which Manager is derived and so on; I thought it's interesting to see what happens with basic math which everyone agrees makes sense and is practically important.

#5 Kragen Javier Sitaker on 04.05.14 at 11:04 pm

I think what you're looking for is a "dependent type system", like what ATS or Coq offers. I can't be sure, though, because I can't make head or tail of the introductions I've found to these systems; on the other hand, I'm pretty sure their type systems are Turing-complete and therefore undecidable.

#6 Yossi Kreinin on 04.05.14 at 11:19 pm

C++'s type system is Turning complete and undecidable I think. The question is how the type of would look like, and whether an iterative method like Gauss elimination could work.

Maybe what I should do is look if Gauss elimination can work that way at all and if it can't publish the fact under a trollish title like "static typing can't do 5th grade math" to make the point a bit stronger.

I mean I know there's a ton of complicated type systems out there, what I don't know is what happens with this case.

#7 Adso on 04.06.14 at 2:46 am

Dependent type systems like those featured in Coq are Agda are type systems which, in effect, allow types to depend on values: an extremely simple example is a vector parameterized by its length. In this case, a dependent type system is exactly what you want, because it would allow you to express the type of a matrix whose values have a type which depends on their location in the matrix.

What would it look like? Well, let's assume we have a type LessThan (limit : Nat) which can express all natural numbers less than n for some n. For example, the possible values of the type LessThan 3 are 0, 1, and 2. (This is straightforward to express in Coq and the like.) The type of a matrix might then be

data Matrix : (height : Nat) -> (width : Nat) ->
(LessThan height -> LessThan width -> Type) -> Type

What this is saying is that a Matrix is a type parameterized by a width and a height, both expressed as natural numbers, and a mapping from pairs of numbers to types—specifically, pairs of numbers that are less than the dimensions of the matrix itself. We then create a type-level function prod which takes two numeric types and returns the type of the product of their values, so e.g. prod meters meters == metersSquares for some definition of those types. That means the type of the operator in question will look something like

backslash : Matrix n n f -> Matrix n m g -> Matrix n m h
where h y x = prod (f x y) (g y x)

The end result will be a matrix whose values are typed according to their position, where those types have been computed as a function of the types in the two matrices given as arguments. I'm not going to say that it's a particularly straightforward thing to do, but it's certainly expressible in the static type systems of these languages.

Incidentally, Kragen is half-right—if types can depend on values, and values can never get returned because of, say, an infinite loop, then type-checking is necessarily undecidable. Most dependently typed languages get around this by making it impossible to write programs that loop forever (e.g. by enforcing that a function can only recurse on ever-decreasing subsets of its argument) making Coq and the like not actually Turing-complete. That hasn't stopped useful software (e.g. a C compiler) from being written in them.

#8 Yossi Kreinin on 04.06.14 at 6:18 am

Thanks; I think this barely scratches the surface of the issue. I could define a matrix with element type depending on its position in C++'s hideous type system easily enough. One question is how to write the code of the function… How do I invert this bloody matrix?

#9 Pierre F. on 04.06.14 at 2:20 pm

The problem here is that all type systems I know only only have the notion of a scalar(and maybe typed lists for recursive functions). Even a Vector is a scalar from the point of view of the C++ type system. It would be interesting to have a type system that has tensors as built-in concepts but we're not there yet and trying to build it on top of what is already here doesn't seem very easy.

#10 Yossi Kreinin on 04.06.14 at 9:32 pm

I think that it's worse than that – specifically iterative methods assume that the various equations are basically of the same type as they add/subtract them from each other, etc. So even if you can type every element correctly elegantly enough the whole idea of units as types breaks down.

I should look into this and republish it I guess…

#11 Norman Yarvin on 04.06.14 at 9:57 pm

In general, each row and each column of a matrix would need its own type (its own units). So the type of a matrix would be described by two arrays of types, one array for the rows and one array for the columns; the type of each matrix entry would be the product of the type for that row and the type for that column. (Or perhaps it would be more convenient if it were the one divided by the other, so that in a matrix-vector multiplication, one array of types would be the same as the types of the input vector and the other array would be the same as the types of the output vector.) Two-dimensional arrays could of course be imagined where each entry had a completely different type, but such arrays would not be matrices that could be multiplied by a vector or by another matrix, since that would result in units mismatches. Of course the simplest case would be for all the entries in the matrix to have the same units, and that's likely also the commonest case. Your example wants more than that, though.

In practice, the tool of choice for the problem you describe (least-squares solution of overdetermined systems) is the singular value decomposition, or the somewhat cheaper QR decomposition, since explicitly forming the matrix X'*X and then inverting it loses about half your precision. Which raises the question: in an SVD routine, what should the types of the output matrices be? The routine takes a matrix A, and outputs matrices U, S, and V, such that U and V are orthogonal, S is diagonal, and A=U*S*V'. One way would be for U's rows to have the row types of A, for V's rows to have the column types of A, and for the rest to be unitless (not outside the unit type system, just having units equal to one). I'm guessing that would work just fine, but any number of other solutions would also be self-consistent, and it's conceivable someone would need one of them.

Specifying that the type of a matrix consists of two arrays of types is something for which syntax could get ugly fast. In the case of your example, with a matrix whose columns had type 1, meters, meters^2, meters^3, and so forth, there'd have to be some special notation to indicate that the n'th column had type meters^n. With a fixed number of columns, you could just write out the type array, but with a variable number, you'd need a system where you could put in a formula for the type. And in general that formula might be arbitrarily complicated. The type checker would then be in the business of comparing two such formulas for equality — and for that, it'd have to be able to solve the halting problem, which of course can't be done. (Well, not by the compiler, as in static type checking; it would be straightforward to check at runtime.)

However, such code wouldn't be a good thing to write in the first place. Polynomial approximation is fine for degree-two or degree-three polynomials, but gets ill-conditioned fast as the degree increases. So writing this code for arbitrary n isn't a good idea in the first place. What does make sense in this scenario is to do the least-squares approximation using orthogonal polynomials, such as Chebyshev or Legendre polynomials. Analytically it's equivalent, but numerically it actually works (rather than blowing up). In that case the entry in the n'th column would not be x^n but rather a[0] + a[1]*x + a[2]*x^2 + … + a[n]*x^n, where the a[i]s are the coefficients of the polynomial. Now each a[i] is going to need a distinct type, and those types can be chosen such that the result is unitless. So instead of the matrix X having complicated units, it can just be unitless.

But we can also do this in your original case! Just multiply the n'th column by a quantity a[n] which numerically is equal to one, but whose units are the inverse of its units. Then you get a unitless matrix, and the rest of the problem is easy. Of course that kind of guts the type checking, raising the question of whether it's any better than simply ignoring the issue of units inside this routine.

I'm afraid it isn't. There are two possibilities here. One is that we specify a[n]'s units by saying that they are the inverse of whatever that the n'th column's units are. That's about like ditching the type checking: if we happen to screw up those units (say, making them inches rather than meters), the type checking will still check okay.

The other way is to specify r_n's units by saying that they are meters^(-n). That gives us type checking: inches will still fail (or maybe get converted automagically into meters). But have we evaded the horns of the halting problem? No, we have not. Again, if code is going to be written for arbitrary n, the type checker has to verify that the formula meters^(-n) multiplied by the n'th column's units (meters^n) is equal to one. Of course in that case it is not hard to check, but again, in general that formula might be arbitrarily complicated — so there is no general way of checking it at compile time.

So I think we're left with three cases that can work. One is matrices of arbitrary dimensions where every entry in the matrix has the same units. Another is matrices of fixed dimensions where the rows and columns have specified units. The third is runtime units checking (which might be optimized away in simple cases such as this one).

#12 Mark on 04.07.14 at 4:36 pm

Just to make sure: you want

A, B = solve(X, Y)

and the problem is that while implementing "solve", you want the language to properly check units. Yes?

#13 Mark on 04.07.14 at 5:54 pm

If I understand what you want correctly, then this should indeed happen "naturally". Take your any "flesh-and-blood" programming language, even your very favorite C++ (I'm teasing). Implement Gaussian elimination in the "natural" way. Things should Just Work (TM).

#14 Andrew Bissell on 04.07.14 at 8:54 pm

Re: the Mars Orbiter, did you intentionally pick the example that Bjarne Strousrup uses to plug C++ templates to be cheeky? :)

(skip ahead to 21:00)

#15 Mark on 04.07.14 at 9:45 pm

To expand on my previous comment: Gaussian elimination is a procedure that respects units, just like Cramer's rule. Why would you expect problems from one but not the other?

#16 Yossi Kreinin on 04.07.14 at 10:09 pm

@Mark: "respects units" alright – it'll work with 2 equations each defined by a point. It will not work with X'*X because different rows have different units and you'll be adding rows. And you'd have a heart attack if you saw how far down this path I once went in my favorite C++.

@Andrew: actually I didn't know he uses this example to plug C++ templates, though I'm not surprised he does.

@Norman: I think the simplest question that I should have focused on and maybe will in a follow-up is that Gaussian elimination violates unit correctness if applied to X'*X, while it does work for sufficiently small dimensions and it's faster than SVD. (Even if it were so numerically unstable so as to have no practical uses at all, it doesn't really refute the point about typing because the method is algebraically correct and so we're entitled to the type system's support as much as in any other mathematically sensible method which should type-check just fine if type systems live to their promise.)

#17 Norman Yarvin on 04.08.14 at 12:21 am

In Gaussian elimination, you're not purely and simply adding rows; you're multiplying one row by a scale factor and adding it to another. So the scale factor gets its own units, which makes the row addition pass the units check.

#18 Yossi Kreinin on 04.08.14 at 2:39 am

That much is true and it did cross my mind; but… gosh-darn. I mean, I see how I could make it work to get the compiler to shut up, sort of, but say when you're doing a Gaussian elimination for X of the right dimension – a simple equation system – then the scale factor is a plain number and if it's X'*X it has different units in each row. A generic Gaussian elimination function starts looking a bit mind-numbing.

And… what about a Gauss-Jordan elimination with pivoting where you pick the pivot dynamically? Does this finally force us into a territory where you ought to keep track of types at run time?

#19 Ivan Tikhonov on 04.08.14 at 8:15 am

Problem is, you asking wrong question.

Can you add meter to inches? Yes, you can! Just take a piece of rope of X meters long, tie to another rope Y inches long and you'll have an answer.

Thing is, meters and inches aren't types, they are coefficients.

The problem rather stems from the fact that multiplying meters is not typical math multiplication R*R->R but rather R*R->R^2 (square things).

So the reason you can't sum square meter to meters is because they are totally different things.

Just like dividing X meters by Y seconds is not the same as doing X/Y while in practice it gives a right answer.

You can do it right in C++ by overloading math operators and manually creating types and operators for needed values.

Seems like what you are asking for is automatic creation of these types and operators for them.

Not sure if that exists.

#20 Yossi Kreinin on 04.08.14 at 9:23 am

Erm… well they are coefficients and if you make them symbolic coefficients and do symbolic math, things might work just fine. And yes they probably aren't types. But why is it I who am asking the wrong question, instead of whoever giving types as the answer giving the wrong answer?..

And as to adding totally different things – doesn't Gauss-Jordan ran on X'*X add (scaled) rows containing "different things" ending up with the right result?

#21 Mark on 04.08.14 at 10:24 am

"…doesn't Gauss-Jordan ran on X'*X add (scaled) rows containing "different things" ending up with the right result?"

No! It would be a very strange algorithm if it did so. After scaling, the rows contain the same things.

#22 Ivan Tikhonov on 04.08.14 at 11:02 am

> And as to adding totally different things – doesn't Gauss-Jordan ran on X'*X add (scaled) rows containing "different things" ending up with the right result?

I am not on a good terms with math, but… is it? Not sure about Gauss-Jordan, but Gauss elimination works ok:

Units of X'*X is (as you said):

a b // m^2 m
c d // m n

c'=0=Pa+c, so P=-c/a with units being m/m^2 = 1/m

Resolving units:

c'=Pa+c // 1/m * m^2 + m = m + m
d'=Pb+d // 1/m * m + n = n + n

so new matrix ends up being:

a b // m^2 m
0 d' // m n

#23 Mark on 04.08.14 at 11:42 am

@Yossi K. Actually, it seems that I must admit you are at least partially right. After thinking for a bit, I realized that static typing and row reduction (as in Gaussian elimination) are not as compatible as I thought. Informally, row reduction tries to turn a square matrix into the identity matrix, or as close as it can get. The identity matrix of course is dimensionless, but the matrix you start with is not. The most natural way to implement row reduction in C++ (and most languages) is to do it "in place" (that is, modify the original matrix). But if the type system is checking units, that means that individual matrix are change types as you go! Thus, static typing fails here. The basic problem is that static typing has no issues with code such as:
a = b * c;
d = e / f;
It has big problems with code like this:
x *= y;
w /= z;
Because unless y and z are dimensionless, it requires changing the types of x and w at runtime. However, all is not lost. It is impossible to implement in-place row-reduction, but it can still be done by making a fresh result matrix. If implemented carefully, this should not affect the asymptotic efficiency, and only doubles the memory requirements. Doubling the memory is usually considered a bad thing, but in your example, the matrix is only four elements, so who cares?

#24 Yossi Kreinin on 04.08.14 at 11:45 am

@Mark: I keep claiming I'm wholly right :) Specifically – the problem is that you pick a "pivot row" dynamically depending on the runtime values and work on that. How do you do this with static typing – in-place or not?

#25 Yossi Kreinin on 04.08.14 at 11:48 am

@Ivan: Gaussian elimination does work – if spelled out for the given dimension/problem, and perhaps through some unholy contortion if specified as a generic function with hideous static types. Gauss-Jordan however picks a pivot row dynamically which to me seems to preclude a static type system-based unit checking or at least most sane approaches to it.

#26 Ivan Tikhonov on 04.08.14 at 12:18 pm

> Gauss-Jordan however picks a pivot row dynamically

Ohh, i see now. Problem is not that you add incompatible values (you probably aren't).

Problem is that swapping rows changes signature of matrix in a way that depends on runtime value of matrix.

Looks like a problem of getting nth element from heterogeneous collection all over again.

Can it be solved by computing all possible output types? Looks like it should converge to a single result type in a Gauss-Jordan case.

#27 Mark on 04.08.14 at 1:08 pm

@Yossi K. Ok, I think I finally begin to see what you are saying. You start with a toy example (linear regression). Then you say that you can solve it by inverting a matrix. And even though inverting this particular matrix is doable (and easy) with a static type system, inverting an arbitrary matrix is hard (or impossible). Did I summarize your viewpoint correctly?

#28 Norman Yarvin on 04.08.14 at 1:34 pm

With Gaussian elimination, even if you don't do pivoting, your scale factor, which in today's routines is just a single variable declared once, will have to take on different units for each combination of two rows: its units will have to be declared as row A's units divided by row B's units. That's not inherently a big deal: instead of putting the declaration at the head of the routine, you can put it in the inner loop — so even though you're declaring it O(n^2) times, there is no waste; the same storage can be used. The compiler can look at the declaration and see that the result of multiplying (B's units) by (A's units / B's units) is just (A's units).

Pivoting doesn't really change this. For one thing, you don't actually have to swap rows; you can just keep track of which rows have been used so far. (Or, for that matter, which rows and columns have been used, if you're doing full pivoting.) I believe this is actually the approach that is used in LINPACK and LAPACK, although I am not certain of that.

The question is whether this sort of symbolic checking can be done in sufficient generality without running into the halting problem.

#29 Yossi Kreinin on 04.08.14 at 1:41 pm

@Mark: even this matrix or another smallish matrix – say 5×5 – gets gnarly if you do Gauss-Jordan because you swap rows dynamically.

@Norman: I need to look closer at this row-swapping business to see how the bookkeeping would look like in a static type system.

#30 Mark on 04.08.14 at 4:19 pm

Like Norman said, there's usually no need to actually swap rows. I'm getting tempted to actually implement this, even if for no reason other than to prove to you that it would NOT get gnarly.

#31 Yossi Kreinin on 04.08.14 at 9:50 pm

Go ahead :)

#32 Mark on 04.08.14 at 10:35 pm

Fine. Let's agree on what pieces it should include and how general it should be, for it to count as valid evidence. Obviously, as a minimum, it should be able to handle your linear regression with meters via row reduction example. Anything else?

#33 Mark on 04.08.14 at 10:49 pm

Also, we should agree on "not too gnarly". How about less than 100 lines of source, no fancy tricks allowed? And too make sure I don't use too many long lines, less than 10KB total source code.

#34 Yossi Kreinin on 04.09.14 at 12:22 am

What it should do: a general-purpose Gauss-Jordan elimination routine that one could run on:

* "just a bunch of numbers" – no fancy static types
* an equation set where every column has a different type (with no clear relations between the types except that the right hand side can be reached by multiplying the solution vector by the left hand side)
* an X'*X equation set where X is the previous matrix

Now maybe that makes no sense; if so let's discuss how the columns' types should be related.

As to "gnarly" – let's start with the user being off the hook – being able to call the function with all these arguments. As to the prettyness of the code itself – it's in the eye of the beholder; let's say it should be 2X longer than the plainly typed version, tops?

#35 Mark on 04.09.14 at 10:29 am

Ok, so basically, you want a procedure something like this:
answer = invert(question)
where invert is implemented using Guassian elimination, along the lines described in a textbook or Wikipedia article, and further, the answer matrix is correctly typed relative to the question matrix. This should not be more than twice the length of code that just works for doubles. If I paraphrased you correctly, then I'm pretty sure I can meet requirements 1 and 3, and hopefully a weak version of 2. The weakness is that instead of the types being completely arbitrary and opaque, I insist they are all derived from some common unit system (like say, all types are some power of length times some power of time).

#36 Yossi Kreinin on 04.09.14 at 9:53 pm

OK – not completely arbitrary types; but – Gauss-Jordan with pivoting.

#37 Mark on 04.10.14 at 6:44 pm

OK. This will take at least few days. I presume I can't hardcode the number 2, right?

#38 Mark on 04.10.14 at 9:10 pm

Wait a sec… you said you want to pick a pivot dynamically, instead of just going through the rows one at a time? What strategy would you use? I mean, you clearly want to avoid zero, but that still lots of choices. Or is it OK to pick any non-zero element?

#39 Yossi Kreinin on 04.10.14 at 10:33 pm

I don't remember what Gauss-Jordan elimination does exactly; I could look at the Numerical Recipes book at work next week… There's code in there picking the pivot.

#40 Norman Yarvin on 04.11.14 at 8:08 pm

The larger the pivot, the better. But that's 'larger' as compared to the rest of the row, not 'larger' in absolute terms. Taking a quick peek at Numerical Recipes, they say that a good way to do this is to, before starting the whole procedure, scale each row so that the largest element has size one. (You then have to keep track of the scale factors, of course.)

By the way, I figure you guys are talking about partial pivoting, since that's the usual way it's done, but there's also full pivoting, where you swap columns as well as rows, to get an even larger pivot. In that case you can just pick the largest element in absolute terms, and not do any pre-scaling.

#41 Norman Yarvin on 04.11.14 at 11:55 pm

Speaking of which, supposing you do this scaling of each row. What should that do to the units? Say a row's units were meters, to start with. Are they still meters? Well, from one point of view, sure: they just refer to a scaled-up or scaled-down version of the problem. But from another point of view, no. Say the scale factor just coincidentally happened to be 0.0254; then these are actually inches, not meters. Or from a third point of view, one can decide that the scaled matrix now just has plain unitless numbers and that all the units reside in the scale factor. That'd be the easiest way for Mark to implement this: after the initial scaling, he can just do the usual Gauss-Jordan elimination without units entering into the mechanics of it.

#42 Yossi Kreinin on 04.12.14 at 12:28 am

Isn't it a bit like deciding that the entire solver operates on plain numbers and "trust me, it works out"?

#43 Norman Yarvin on 04.12.14 at 7:03 pm

It is, but it has the added benefit of being compliant with the demand that everything be units-checked.

This is part of the deal, with mathematical abstraction: the math works no matter what the numbers represent. When you're in the middle of an abstract mathematical routine, why _should_ you be worrying about units? Whether you're dealing with meters or with tangerines is of no concern. Save that for parts of the code where units do matter. The fact that you can easily forget about units in the routine, by stuffing them away into a scale factor, is a feature, not a bug. This way the solver is no harder to program than it would be under the "trust me, it works out" scheme, but at the same time it can be part of a rigorously units-checked program. For instance in the case you started with, the fact that "meters" were the units of the X matrix wouldn't just be lost, but would be filed away, so that if anyone who used the resulting least-squares approximation tried to feed it data that were in inches instead, it would yield an error.

#44 Yossi Kreinin on 04.12.14 at 9:31 pm

I'm not sure I follow you – perhaps an example could help me; what I hear – and I suspect that I mishear – is "the math works no matter what the numbers represent" so just forget the units – but what if I have a bug in the math and why are difficulties in getting the compiler to type-check this kind of math (Gauss-Jordan) and having to forget about unit checking there not representative of a general problem – of compile time checking being much more limited than it seems from simple examples?

BTW I've been browsing the email archives at your site for the last few days… Good stuff.

#45 Norman Yarvin on 04.13.14 at 11:26 am

Oh, I agree that the difficulty with units checking here is representative of the general problem of it not being easy to do this stuff at compile time. For that matter, I neglected to mention a cost: if you're doing Gauss-Jordan and filing away the units in a vector of scale factors, that means you can't, in a static type system, use the same storage for the matrix: you're changing the types, so you have to allocate a different matrix, typed in different units. (One could conceive of compilers that would optimize this allocation away, but really, forget about it, unless it's part of a benchmark they're trying to game.) For that matter, even if you do as we were discussing earlier, and put units into the actual computation, so as not to have to allocate another matrix, you still can't play the trick that a lot of Gaussian elimination routines do of storing the two matrices that are the results of the routine, one upper diagonal and one lower diagonal, in the same storage as was used for the input matrix — because in general the two output matrices will have different units from each other.

As for "what if I have a bug in the math", you'll debug it in the usual way. For routines like Gauss-Jordan, the task of the routine is well-defined, so it's straightforward to write thorough tests for it. Sure, dimensional analysis can help in catching bugs, even those that are not simply units errors — but on the other hand, the bug will be more visible without a lot of units annotation cluttering the code.

In practical terms, rather than do any debugging, you'll probably use some Gauss-Jordan routine that someone else has written and thoroughly debugged, so you'll want to just annotate it for units, as in "compiler, please treat this as a black box, for which if the input units are U, the output units are such-and-such function of U".

There are certainly a lot of places where units checking helps. I just doubt that core math code is one of them. If you look in a textbook explaining Gauss-Jordan, it's not going to mention units. They just don't help at all in an intellectual understanding of the problem. On the other hand, a textbook dealing with Maxwell's equations will certainly mention units, as they are a big part of what is going on. Still, if you're writing electromagnetics code from scratch, most of the troubles with units can be avoided by imposing a strict rule that all variables should be in a single self-consistent units system (such as MKS); if you have that kind of rule, it can be a waste of time to add annotations everywhere that say you're following the rule. It's when you have a big sprawling mess of code, some parts of which use different units from other parts, that units annotations become quite desirable, so that you don't take (say) a length variable from one part of the program, in inches, and feed it into another part which expects meters.

#46 Yossi Kreinin on 04.13.14 at 1:24 pm

I think that we sort of agree on compile time unit checking being potentially useful in some cases and either hard/impossible or counter-productive in others. Basically checking is nice but you ought to be able to cast everything to double on various occasions.

#47 Mark on 04.16.14 at 10:04 pm

I just realized that the first person to comment on this article is also named Mark. So, just to clarify: Mark the F# defender is not the same as Mark the C++ defender. (I'm the second Mark.)

#48 Yossi Kreinin on 04.16.14 at 10:46 pm

OK :)

#49 Mark on 04.19.14 at 7:56 pm

Ok, here's my version 0.0. It does not handle units at all, just plain old doubles. I'm posting it here to make sure this is the kind of thing you want to convert into a unit-checked version. Is it general enough? Overkill? Missing pieces, extra pieces, and so on? It's a bit over a hundred lines in length, and just over 3KB in source code size, and at the very least solves the linear regression problem, but has a subroutine that can row-reduce an arbitrary matrix whose size is known at compile time. Also, the exercise of writing this has made me fear that you are correct: getting this to work well with units is very likely to get gnarly. After all, I already have 3 template matrix classes, and the code just works with doubles! But I claim it can be done!

#50 Mark on 04.19.14 at 8:00 pm

ok, posting system is not letting me post my code.

#51 Mark on 04.19.14 at 8:04 pm

never mind, I see it's awaiting moderation. But all the less than signs get messed up. So don't approve the last one, I'll fix it first.

#52 Mark on 04.19.14 at 8:08 pm

template<int cols>
class Matrix {
typedef double Row[cols];
Row*const mat;
const int r;
Matrix(const Matrix&); // need a copy constructor, but I'm cheating
int rows() const { return r; }
Matrix(int rows):mat(new Row[rows]), r(rows) {}
double& operator()(int row, int col) {
if(row < 0 || row >= r || col <0 || col >= cols) throw "bug!";
return mat[row][col];

template<int rows>
struct TransposedMatrix {
int cols() const { return mat.rows(); }
TransposedMatrix(Matrix<rows>& m) : mat(m) {}
double operator()(int row, int col) { return mat(col,row); }
Matrix<rows>& mat;

template<int n>
TransposedMatrix<n> transpose(Matrix<n>& m) { return m; }

template<int rows, int cols>
struct SmallMatrix {
double mat[rows][cols];
double& operator()(int row, int col) {
if(row < 0 || row >= rows || col <0 || col >= cols) throw "bug!";
return mat[row][col];
for(int i=0; i<rows; i++)
for(int j=0; j<cols; j++)
mat[i][j] = 0;

template<int rows, int cols>
SmallMatrix<rows,cols> operator*(TransposedMatrix<rows>& a, Matrix<cols>& b) {
if(a.cols() != b.rows()) throw "bug!";
return matmul<rows,cols>(a, b, a.cols());

template<int rows, int cols, int n>
SmallMatrix<rows,cols> operator*(SmallMatrix<rows, n>& a, SmallMatrix<n, cols>& b) {
return matmul<rows,cols>(a, b, n);

template<int rows, int cols, typename M1, typename M2>
SmallMatrix<rows,cols> matmul(M1& a, M2& b, int n){
SmallMatrix<rows, cols> c;
for(int i=0; i<rows; i++)
for(int j=0; j<cols; j++)
for(int k=0; k<n; k++)
c(i,j) += a(i,k) * b(k,j);
return c;

template<int rows, int cols> // Gaussian elemination alg simplified from Wikipedia, with different variable names
void row_reduce(SmallMatrix<rows, cols>& mat) {
int n = rows < cols ? rows : cols;
for(int i=0; i<n; i++) {
double scale = mat(i,i);
for(int j=i; j<cols; j++)
mat(i,j) /= scale;
for(int j=0; j<rows; j++) {
if(j==i) continue;
double factor = mat(j,i);
for(int k=i; k<cols; k++)
mat(j,k) -= mat(i,k) * factor;

template<int n>
SmallMatrix<n, n> inverse(SmallMatrix<n, n> mat) {
SmallMatrix<n, 2*n> temp;
for(int i=0; i<n; i++) // glue on the identity matrix
for(int j=0; j<n; j++)
temp(i,j) = mat(i,j),
temp(i,j+n) = i == j;
SmallMatrix<n,n> ans;
for(int i=0; i<n; i++)
for(int j=0; j<n; j++)
ans(i,j) = temp(i,j+n);
return ans;

struct Line { double slope, intercept; };

Line least_squares_fit(const double x[], const double y[], int n) {
Matrix<1> Y(n);
for(int i=0; i<n; i++) Y(i,0) = y[i];
Matrix<2> X(n);
for(int i=0; i<n; i++)
X(i,0) = x[i],
X(i,1) = 1;
TransposedMatrix<2> Xt = transpose(X);
SmallMatrix<2,1> ans = inverse(Xt*X)*(Xt*Y);
Line ret;
ret.slope = ans(0,0);
ret.intercept = ans(1,0);
return ret;

int main() {
double x[] = {1,2,3}, y[] = {7,8,9}; // input
Line line = least_squares_fit(x, y, 3);
printf("y=%gx+%gn", line.slope, line.intercept);

#53 Mark on 04.19.14 at 8:08 pm

Ok, I fixed it.

#54 Yossi Kreinin on 04.19.14 at 11:26 pm

Sorry for the comment system being what it is… I think maybe <pre> will help:

class IndentLevel0 {
   int indentLevel1() {
      return indentLevel2();

As to the code – I think it's Gaussian elimination rather than Gauss-Jordan, am I wrong? The latter picks a pivot row dynamically based on the element values.

#55 Yossi Kreinin on 04.19.14 at 11:27 pm

OK, pre does help. I think you still need to escape less-than and greater-than (even inside pre) to make it look right. Without escaping:

template class C;

With escaping:

template<class T> class C;
#56 Yossi Kreinin on 04.19.14 at 11:28 pm

…Yeah, so the pre tag with manual escaping for HTML entities is what works in this thing…

#57 Mark on 04.20.14 at 12:40 pm

You never got back to me about how to pick a row dynamically. If the goal is to make a point about type systems, any dynamic rule should work, so how about I pick the first non-zero row?

#58 Mark on 04.20.14 at 1:23 pm

Ok, now with dynamic pivots! Let's try an old-fashioned xmp tag!

#59 Mark on 04.20.14 at 3:11 pm

Ok, now with pre tags!

template<int cols>
class Matrix {
typedef double Row[cols];
Row*const mat;
const int r;
Matrix(const Matrix&); // need copy constructor, but I'm cheating
int rows() const { return r; }
Matrix(int rows):mat(new Row[rows]), r(rows) {}
double& operator()(int row, int col) {
if(row < 0 || row >= r || col <0 || col >= cols) throw "bug!";
return mat[row][col];

template<int rows>
struct TransposedMatrix {
int cols() const { return mat.rows(); }
TransposedMatrix(Matrix<rows>& m) : mat(m) {}
double operator()(int row, int col) { return mat(col,row); }
Matrix<rows>& mat;

template<int n>
TransposedMatrix<n> transpose(Matrix<n>& m) { return m; }

template<int rows, int cols>
struct SmallMatrix {
double mat[rows][cols];
double& operator()(int row, int col) {
if(row < 0 || row >= rows || col <0 || col >= cols) throw "bug!";
return mat[row][col];
for(int i=0; i<rows; i++)
for(int j=0; j<cols; j++)
mat[i][j] = 0;

template<int rows, int cols>
SmallMatrix<rows,cols> operator*(TransposedMatrix<rows>& a, Matrix<cols>& b) {
if(a.cols() != b.rows()) throw "bug!";
return matmul<rows,cols>(a, b, a.cols());

template<int rows, int cols, int n>
SmallMatrix<rows,cols> operator*(SmallMatrix<rows, n>& a, SmallMatrix<n, cols>& b) {
return matmul<rows,cols>(a, b, n);

template<int rows, int cols, typename M1, typename M2>
SmallMatrix<rows,cols> matmul(M1& a, M2& b, int n){
SmallMatrix<rows, cols> c;
for(int i=0; i<rows; i++)
for(int j=0; j<cols; j++)
for(int k=0; k<n; k++)
c(i,j) += a(i,k) * b(k,j);
return c;

template<int rows, int cols> // Gaussian elemination alg simplified from Wikipedia, with different variable names
void row_reduce(SmallMatrix<rows, cols>& mat) {
int n = rows < cols ? rows : cols;
for(int i=0; i<n; i++) {
int p = i;
while(p<n && mat(p,i)==0) p++;
for(int j=0; j<cols; j++){
double temp = mat(i,j);
mat(i,j) = mat(p,j);
mat(p,j) = temp;
double scale = mat(i,i);
for(int j=i; j<cols; j++)
mat(i,j) /= scale;
for(int j=0; j<rows; j++) {
if(j==i) continue;
double factor = mat(j,i);
for(int k=i; k<cols; k++)
mat(j,k) -= mat(i,k) * factor;

template<int n>
SmallMatrix<n, n> inverse(SmallMatrix<n, n> mat) {
SmallMatrix<n, 2*n> temp;
for(int i=0; i<n; i++) // glue on the identity matrix
for(int j=0; j<n; j++)
temp(i,j) = mat(i,j),
temp(i,j+n) = i == j;
SmallMatrix<n,n> ans;
for(int i=0; i<n; i++)
for(int j=0; j<n; j++)
ans(i,j) = temp(i,j+n);
return ans;

struct Line { double slope, intercept; };

Line least_squares_fit(const double x[], const double y[], int n) {
Matrix<1> Y(n);
for(int i=0; i<n; i++) Y(i,0) = y[i];
Matrix<2> X(n);
for(int i=0; i<n; i++)
X(i,0) = x[i],
X(i,1) = 1;
TransposedMatrix<2> Xt = transpose(X);
SmallMatrix<2,1> ans = inverse(Xt*X)*(Xt*Y);
Line ret;
ret.slope = ans(0,0);
ret.intercept = ans(1,0);
return ret;

int main() {
double x[] = {1,2,3}, y[] = {7,8,9}; // input
Line line = least_squares_fit(x, y, 3);
printf("y=%gx+%gn", line.slope, line.intercept);

#60 Mark on 04.20.14 at 3:12 pm

it's not showing up. odd

#61 Mark on 04.20.14 at 3:15 pm

the pre tag got swallowed. can you fix it, and remove the duplicates?

#62 Yossi Kreinin on 04.20.14 at 11:22 pm

I can't edit the comments, at least not through any WordPress interface that I see… perhaps with this much code, if you want it at some publicly readable place on the Internet the best thing is github or some such.

Regarding pivoting – I think it could be done as in

…though I hope to check what Numerical Recipes says as well.

#63 Mark on 04.21.14 at 11:10 am

Ok, let's see if I can figure out this pre tag:


Cross fingers!

#64 Mark on 04.21.14 at 11:18 am

OK, here's a github gist:

By the way, I'm not using the Wikipedia algorithm for pivot selection because it makes no sense if each row can have different units. But in any case it doesn't matter for our purpose. As long as the pivot is selected dynamically, we know that the type system can handle a different method.

#65 Mark on 04.21.14 at 11:19 am

The comments here drive me crazy sometimes. I never know if they went through!

#66 Yossi Kreinin on 04.21.14 at 11:23 am

Yeah, I approved the link to the gist. Still before type checking, right?

#67 Mark on 04.21.14 at 3:08 pm

Yeah, the stuff in the gist only works with doubles.

#68 Mark on 04.21.14 at 9:12 pm

I should have a type-checked version ready sometime next month. This is proving somewhat challenging, and I haven't even gotten to the hard part yet (the row reduction).

#69 Mark on 04.26.14 at 11:38 pm

I'm going to post little updates here every week or two if no one tells me to shut up. Progress so far: the memory usage should be only 40% of the unchecked version, because the X and Y matrices are no longer create copies of the original array. On an unrelated note: I'm trying to stay away from C++11, even though it would make some things here much easier. One feature I am making heavy use of is the ability to use two closing brackets in a row without a compiler error. If anyone doesn't like that, feel free to use search and replace, which should be easy as I'm not using any bit-shift operators. (And before you say cout: I'm using printf.) So, far I have the function signature for matrix multiplication working correctly, but the function body is not yet implemented. I should have multiplication fully functional by next week. We'll see.

#70 Yossi Kreinin on 04.27.14 at 8:22 am

Why not C++11? Even I upgraded to C++11…

#71 Mark on 04.28.14 at 5:21 am

That's true. You even praised it. I don't remember your exact words anymore, but I think it was something along the lines of "it sucks less".

Maybe I'll do both.

#72 Mark on 05.04.14 at 9:50 pm

I still claim this will eventually get done, but I think you win anyway. This is indeed getting really gnarly.

#73 Yossi Kreinin on 05.04.14 at 9:52 pm

OK :) FYI, I talked about this to a guy who's both a big math geek and a big type systems geek. We'll see what he comes up with :)

#74 Mark on 05.05.14 at 5:21 am


#75 Mark on 05.10.14 at 11:05 pm

Small bug: you wrote that X'*Y looks like this:

sum(xi*yi) sum(yi) //m^2 m

Shouldn't it actually be the transpose of that, like so?

sum(xi*yi) //m^2
sum(yi) //m

#76 Mark on 05.10.14 at 11:07 pm

Did my comment go through this time?

#77 Mark on 05.11.14 at 12:29 am

I have a problem. I've been going through the motions of implementing matrix multiplication with unit checking, and even recently congratulated myself when I realized that my latest compiler error was not because of a buggy multiplication algorithm, but rather because I was multiplying incompatible matrices. And finally, it was time to start working on the main problem: inverting a matrix. Specifically, inverting X'*X.

And then I realized something. I have no idea what units the inverse should have. I don't mean my code can't tell, I mean I personally can not tell. In fact, it's worse than that. I have constructed a "proof" that the matrix can't possibly have an inverse. I won't bother posting the proof here, because it just has to be wrong. But I can't figure out why.

#78 Yossi Kreinin on 05.11.14 at 8:03 am

You know, I kinda feel bad about this :) I wrote about something which ended up being a needless time sink for me many years ago. I didn't intend it to become a needless time sink for someone else…

That said – perhaps solving the equations without explicitly representing the inverse matrix would work better… Though the inverse sure exists (often) and would need to have a type…

#79 Mark on 05.11.14 at 10:52 am

No need to feel bad. I usually find MUCH worse time sinks without any help. :) This one makes me feel warm and fuzzy. ("Restoring Yossi's faith in static typing while advancing the state of the art in C++ template abuse!")

But if you claim the inverse exists, what units does it have? You have a pretty picture showing the units for X'*X. Can you make a similar picture for its inverse?

#80 Yossi Kreinin on 05.11.14 at 11:26 am

Erm… I haven't tried, but can't it be worked out, for a 2×2 case, from a formula computing every element?

#81 Mark on 05.11.14 at 11:33 am

Maybe, maybe not. When I plug in the units, I get nonsense. If you don't feel like finding the formula yourself, try this:

#82 Mark on 05.11.14 at 11:37 am

The comment system on this blog drives me nuts! Attempt three: here is a formula

#83 Mark on 05.11.14 at 11:48 am

Let X'X =

Then according to Wolfram Alpha, the inverse is

[d/(ad-bc), -b/(ad-bc)]
[-c/(ad-bc), a/(ad-bc)]

Now tell me the units :)

#84 Mark on 05.11.14 at 11:49 am

Oh, now I see they all went through.

#85 Yossi Kreinin on 05.11.14 at 12:12 pm

It's simple: if there's a link, comments are held for moderation. Why? So bloody spam bots don't get link juice from their bloody comments… And so that readers don't have to sift through that shite…

Well, ad-bc is m^2. So we get:

1/m^2 1/m
1/m n


#86 Mark on 05.11.14 at 1:38 pm

Very good. Except that multiplying a matrix by its inverse should yield the identity matrix, which is dimensionless. This seems to not happen here.

#87 Mark on 05.11.14 at 2:06 pm

Sorry, never mind I am wrong. I made an incorrect assumption. I assumed that the identity matrix must always be completely dimensionless. In general this is not true. The zeros of the identity matrix can have a dimension, as long as the ones do not. The units you gave are correct. Thank you for putting up with me.

#88 Jules on 05.12.14 at 2:30 am

Thinking about it as linear operators instead of matrices makes this much easier. A linear operator is simply a function from one vector space to another f : U ~> V. I'm using a ~> b here to denote linear operators. The vectors can have different types. If U is a position in space it will have type (meter,meter,meter). It can even have different types for different components, like (newton, seconds, meter).

Now the type of \ is obvious. It simply inverts a linear operator. So it has type (U ~> V) -> (V ~> U).

#89 Mark on 05.12.14 at 11:34 am

@Jules: I think what YK really wants to know is not the type of the "" operator. As far as I can tell, what he really doubts is the possibility of actually implementing this operator, with unit checking, in a static type system, using an algorithm of his choice. In particular, unless I misunderstood him entirely, he claims that Gauss-Jordan elimination will be near impossible to implement, because picking the pivot dynamically is not compatible with static typing.

#90 Yossi Kreinin on 05.12.14 at 9:56 pm

@Jules: sure, but what are V and U?.. Given all the options of how the units can be typed? And what will be the types of intermediate variables in a Gauss-Jordan elimination?

#91 Jules on 05.13.14 at 1:14 am

V and U are the types of vectors. I gave two examples (meter,meter,meter) and (newtons, seconds, meter). Is this unclear?

Gaussian elimination does primitive row operations, each of which affects the matrix type in a definite way:

1. Swap two rows, say rows 1 and 2. Given matrix M : (a,b,c) ~> (d,e,f), the new matrix has type (b,a,c) ~> (d,e,f).

2. Multiply a row by a scalar, say row 1. Given matrix M : (a,b,c) ~> (d,e,f), and scalar x : g, the new matrix has type (a*g,b,c) ~> (d,e,f).

3. Add one row to another, say row 1 to row 2. Given matrix M : (a,a,c) ~> (d,e,f), the new matrix has the same type (a,a,c) ~> (d,e,f). We have `a` twice since we can only add the first and second row in matrices where the first and the second row have the same type.

Of course, which row will be selected (and even how many rows there are) depends on run-time data. So the types will depend on run-time data. This means that you need dependent types to type check Gaussian elimination (though the final resulting type is not a dependent type).

One thing that is interesting to note, if you hadn't already noticed it, is that an n by m matrix does not have n*m units that we can pick freely, but only n+m. If you were to pick a different incompatible unit for each cell in the matrix, it wouldn't be possible to multiply any vector by it.

#92 Jules on 05.13.14 at 1:19 am

Sorry, I made a mistake. In all the preceding example the modified type should be the output type not the input type. So:

1. Swap two rows, say rows 1 and 2. Given matrix M : (a,b,c) ~> (d,e,f), the new matrix has type (a,b,c) ~> (e,d,f).

2. Multiply a row by a scalar, say row 1. Given matrix M : (a,b,c) ~> (d,e,f), and scalar x : g, the new matrix has type (a,b,c) ~> (d*g,e,f).

3. Add one row to another, say row 1 to row 2. Given matrix M : (a,b,c) ~> (d,d,f), the new matrix has the same type (a,b,c) ~> (d,d,f). We have `d` twice since we can only add the first and second row in matrices where the first and the second row have the same type.

#93 Yossi Kreinin on 05.13.14 at 2:55 am

OK, so row-swapping changes the matrix type – then how, in an actual programming language, would the code look like that handles a matrix of dimension N? Does N have to be known at compile time – and if so, do O(N) types emerge from compiling the code, with O(N) pieces of code generated each handling a set of types?

I'm not saying there's no meaningful type for any of the intermediate results in an abstract sense; I just ask how to practically write code such that these types are actually checked.

#94 Jules on 05.13.14 at 3:10 am

That's why I said you need dependent types. The type of the swapRows function would look something like this:

swapRows : (units : List Unit) -> i:int -> j:int -> (U ~> Vec units) -> (U ~> Vec (swap i j units))

Where Vec listOfUnits is a type constructor for a vector with units, and swap i j lst swaps the i-th and j-th elements of the list. This will probably read like voodoo if you don't already know a dependently typed language like Coq or Agda or Idris…

#95 Yossi Kreinin on 05.13.14 at 3:16 am

Does it work if carried to the end where you write the code and not just the prototype of swapRows? I can write the prototype of swapRows in C++ (though it won't be pretty and will read like something worse than voodoo…). It's the code calling it to implement Gaussian elimination that I'm not sure will come out practically usable.

#96 Yossi Kreinin on 05.13.14 at 3:18 am

You know what – since I don't know any of these languages – are i and j arbitrary runtime values or do they ought to be compile time constants? If they're runtime values that's plenty cool. And you're saying you can just call swapRows in a loop with dynamic values of i and j and get static unit checking? That is quite the bloody voodoo, and that I cannot do in C++, and that's interesting enough to delve deeper into this stuff.

#97 Jules on 05.13.14 at 3:29 am

I don't see any fundamental obstacle, but it's not a trivial exercise. First you would need to implement a unit system for tracking units of numbers. Then you would need to implement matrices, and implement gaussian elimination with extra proof terms to make sure the type checker understands that the units match up.

#98 Jules on 05.13.14 at 3:31 am

RE: second reply.

i and j are runtime values. That is indeed exactly what makes dependent types cool. At type checking time, i and j stand for *any* number that i,j could take on at runtime, and the type checker verifies that the types are ok for *any* i,j. At runtime i,j will take on their actual values.

#99 Yossi Kreinin on 05.13.14 at 4:23 am

It would be very interesting to see this at work – this particular example and more generally, how smooth linear algebra can be with unit checking mixed in given dependent types. As someone knowledgeable about these things, do you see it becoming mainstream the way plain type checking or parameterized types or OO-style polymorphism or compile-time/run-time duck typing polymorphism became mainstream? (I mean not in terms of what the chances are of people wanting this as much as what the chances are that if they feel they do want this, they can become fluent in the necessary techniques.)

So F# or Haskell can't do this – only Coq or Agda? What are some other examples where you need dependent types?

#100 Jules on 05.13.14 at 5:31 am

Yes I think it will eventually become mainstream. Dependent types are actually quite simple, although it is a big conceptual leap. One obstacle to dependent types becoming mainstream is that currently you need to prove all the things required by library functions. So if you have a function that accepts only prime numbers, you will have to prove that the number you're passing in is prime. This is often not easy and a lot of work that may not be worth it. So I think on of the things that needs to happen is a system for systematically inserting run-time checks for things that have no programmer written proof, instead of just giving a type error and refusing to run the program. Then it would become a kind of hybrid between contracts and types: if you don't write the proofs, things are dynamically checked as with contracts, if you write the proofs, things are verified at compile time.

The thing that dependent types fundamentally allow you to do is write functions where the output type depends on the value of the input. Take for instance the following function:

f : Bool -> Int
f c = if c then 3 else 5

This is not a dependent type since the output type Int does not depend on the input value c. However:

g : (c:Bool) -> if c then Int else String
g c = if c then 3 else "Hello World"

This is a dependent type because the output type depends on the value of the input. You see that we can write expressions where you'd usually have types: if b then Int else String. In effect in dependently typed languages the type sublanguage (with entities such as Int,String,Bool) is the same as the value sublanguage (with entities such as 3,"Hello",True). You can mix and match as you wish.

Since you know C++ we can compare it with templates:

- With templates, you have a very limited language to express computations. With dependent types you have the full run-time language available in the types.

- With templates, the values have to be known at compile time. With dependent types, the values can come from run-time.

If you want to know more, Certified Programming with Dependent Types is a tutorial about dependently typed programming in Coq.

#101 Yossi Kreinin on 05.13.14 at 5:52 am

You say doing the unit-checked Gauss elimination is non-trivial. How much more code than the "plain double" version, do you think? (If it's way way more then such libraries will remain to today's widespread numeric packages what CompCert is to LLVM or gcc – impressive feats finding few practical uses vs impressively correct though quite bug-ridden software in universal use.)

#102 Jules on 05.13.14 at 10:58 am

Well, first you would have to implement the unit system (doubles-with-units), but that would be a one time deal and you can use it for all kinds of unit checked work.

Then you'd need to implement the basic linear algebra operations with units (like multiplying a vector by a scalar).

Then you need to implement the basic operations used in the Gaussian elimination (swap rows, etc.).

Finally you'd implement the Gaussian elimination procedure with units.

It's hard to guess how much extra code that is. In the first point, you have 0 code to write for plain doubles but you do have to write code for doubles-with-units I think in each of the last 3 points you'd end up with about triple the amount of code compared to plain doubles. That is mainly because even *specifying* how an operation works on the units is almost the same work as actually implementing the operation. Take multiplying a vector by a scalar. You create a new vector where each element is the corresponding element in the old vector multiplied by the scalar. Now look at the units: the unit of each element in the new vector is the corresponding unit in the old vector, multiplied by the unit of the scalar.

I'm sure future developments will reduce that overhead by a lot, for instance by better type inference for dependent types and better automated proof search procedures. The field is just getting started in that respect. Another thing to consider is that once the Gaussian elimination procedure is written, clients can use that in a unit checked way without worrying at all about the internals.

#103 Yossi Kreinin on 05.13.14 at 12:30 pm

It's interesting to look into. I wonder if someone did it.

What do people generally use dependent types for that you can't do without dependent types? What are the typical use cases?

#104 Mark on 05.13.14 at 4:29 pm

It seems to be a popular opinion on this thread that this needs a dependent type system to work properly. From the descriptions I'm reading here, I am pretty convinced that a dependent type system would be VERY helpful. However, I don't think it is required. The C++ type system suffices, if you are willing to mangle the code badly enough, such as turning iteration into recursion and similar stuff.

#105 Mark on 05.13.14 at 7:31 pm

I'm learning some interesting factoids about matrix inversion via row-reduction that I never knew before. I used to think there was nothing to it:

Start with matrix M.
Glue on the identity I, yielding M|I
Row reduce, yielding I|M^-1

What surprised me a few days ago was the identity matrix after row-reduction is not dimensionless. What I learned less than one day ago is that the identity matrix you glue on to start the process is not dimensionless either. In both cases, the ones are dimensionless, but not all the zeros are.

#106 Yossi Kreinin on 05.14.14 at 10:19 am

Regarding dependent types – I think it's also about how many versions of the code are compiled per given set of constants and not just how the code looks like. Although how it looks like I think determines to a large extent whether anyone would actually do it in real life.

#107 Jules on 05.14.14 at 2:18 pm

@Mark: I don't know much about C++, but wouldn't the matrix size need to be statically determined and the whole algorithm need to be essentially unrolled at compile time with templates in order for the C++ compiler to check the units at compile time?

@Yossi Kreinin: Dependent types aren't really used for much yet. There is compcert (which you already knew about), there is F* from microsoft research, there is Idris, Agda, and others. A lot of the research is concerned with using dependent types as a new foundation for math, rather than with practical programming. Idris appears to be the most popular route to practical dependent types, but it's far from "production ready".

#108 Mark on 05.14.14 at 10:30 pm

@Jules: yes.

#109 lmm on 05.15.14 at 3:24 am

I'm interested in implementing this in Scala (which is a relatively mainstream language but does have dependent types), but isn't the problem inherently type-unsafe? I mean, suppose you're trying to solve something like:

x * 2 seconds + y seconds = 5 seconds
x meters – y meters = 3 meters

The only way we can add these to eliminate y is by multiplying the second equation by 1 second/meter, which seems like it defeats the type-safety point. Or am I missing something?

#110 Jules on 05.15.14 at 3:48 am

Why would that defeat the type safety? When you are solving

a*x + b*y = c
d*x + e*y = f

Then you subtract d/a times the first equation from the secnod. The units all work out that way, and you aren't artificially munging the units to be the same; it just falls out of the way Gaussian elimination works anyway.

#111 Yossi Kreinin on 05.15.14 at 5:51 am

If you implement this in Scala it'll be damn impressive :)

#112 Mark on 05.15.14 at 10:40 am

Would it be impressive in C++?

#113 Yossi Kreinin on 05.16.14 at 11:53 pm

It would be, but in a different way :) I meant "in Scala with dependent types, making the code look similar to the vanilla version that does no unit checking". In C++ you aren't aiming at that.

#114 Mark on 05.17.14 at 7:12 pm

Oh, that's what you meant. I agree, that would be super cool. That would upgrade from mere curiosity to something useful!

#115 Mark on 05.17.14 at 7:36 pm

If I'm not getting too annoying yet… you said earlier in this thread that I'd have a heart attack if I saw how far you tried to push this in C++. Do you recall what went wrong, or was it more like "nothing works"?

#116 Mark on 05.18.14 at 12:13 am

Even if I never get this working, it has at least forced me to properly understand C++ templates, from the annoying things like "when is typename required", to the fancy stuff like higher-order templates. Thank you for the blog post.

On a (mostly) unrelated note: I don't believe in Greenspun's tenth rule, but if anyone saw my code right now, it would certainly act as evidence in favor. Seriously, compile time map?? If I catch myself writing lambda, I think I'll just give up.

#117 Yossi Kreinin on 05.18.14 at 2:25 am

I'll gladly publish whatever it is you did write so far in a follow-up… Just to show what's involved…

Myself, I didn't originally try to solve this problem but a related one – I had fixed point (scaled integer) matrix elements and I wanted each element to have a different type because their different units meant they tended to have different ranges. It was a much easier problem, I think, but the code came out so ugly that I gave up; also I don't think it was that good numerically.

Solving linear equations with fixed integers has not become my favorite pastime to say the least.

#118 Mark on 05.18.14 at 10:42 am

I agree, that does sound similar in spirit. That sounds sufficiently similar that I suspect that if I get this working, most of the machinery should be reusable. Maybe even just replacing "meters ^ X" with "accurate to "accurate to X digits" or something. My code currently makes the assumption that you never add two numbers with different units, but that can be generalized if need be.

But yeah, floating point sounds relatively luxurious compared to fixed point arithmetic.

#119 Mark on 05.18.14 at 7:27 pm

come to think of it, that assumption can't be relaxed after all

#120 Levi on 05.23.14 at 9:15 am

This paper seems to go a long way towards what you're asking about, though I haven't thought about it deeply:

#121 Yossi Kreinin on 05.23.14 at 9:41 am

Certainly it seems very much related, but I don't think it points to the direction of the problem being solvable. If I understood correctly, it talks about linear operators applied to matrices, and Gauss elimination, SVD, eigenvectors computation etc. etc. are not linear operators. It also mentions about efficient runtime representations of type information, which implies that the type checking is not fully static.

#122 Mark on 05.25.14 at 10:58 am

Progress report: I finally have the infrastructure in place to begin writing code that inverts a matrix. My code is over 10KB in size, consisting largely of typedefs that would not have been needed had I listened to you and used C++11.

#123 Yossi Kreinin on 05.25.14 at 11:14 am

It's not too late :) If you keep it in C++98 and finally publish it, it'll be hard to make a point – any point – with it since inevitably a bunch of dorks will say it should have been C++11 and it would have been entirely different even if it wouldn't be *that* different :)

#124 Mark on 05.26.14 at 1:11 am

I don't worry about such dorks. If I can convince you, I'm happy. If not, then I will rewrite it :)

Incidentally, did the math and type system geek come up with anything?

#125 Yossi Kreinin on 05.28.14 at 9:08 am

Not yet…

#126 Mark on 06.02.14 at 10:14 pm

I believe I said I'd be done by the end of May. I'm not. However, I've made significant progress. In particular, I'm now in the second 90% stage:

#127 Mark on 06.09.14 at 9:49 am

Sorry been quiet for a while. Almost a week ago, I reached a milestone, but have since been too busy to report it, or work any further on this. (International travel, getting my slides ready, caught some germs, talks to attend…) Anyway, I'm gonna try to work on this for at least 15min a day while I'm here. The milestone in question was this: Gaussian elimination works. I'm going to freeze that file, and then get to work on Gauss-Jordan, so we can then compare how much harder it is.

The current version weighs in at just over 15KB, so the practicality of this in the chosen language (version) is not so high. In particular, if the point of type-checking is to catch bugs, this is counter-productive, because by having so much more code (almost a factor of 5), there will almost certainly be more bugs. In fact, not only is there more code, it is harder code to write, to the extent that I'm catching myself being tempted to cut corners because the particular application doesn't need it. In some cases I resist the temptation (yes, this should be a loop, even though it will only iterate once). In other cases I give in (row reduction on matrices with more rows than columns is not supported.) But in any case, you claimed it's impossible. Once we get possibility established, we can worry about practicality.

#128 Yossi Kreinin on 06.13.14 at 1:06 am

I don't think I explicitly claimed "impossible" – you can write a decimal calculator in sed and you can do similar things in C++'s type system at build time, it's just gnarly…

I think the closer it is to "practical" the more unusual it will be… if C++11 makes it that much closer I'd certainly go for that…

#129 Mark on 06.14.14 at 10:01 am

Actually C++14 makes it even better now that you can use "auto" for return types, instead of only local variables. Even so, I don't know if a factor of two is achievable. So far, dynamic pivoting is proving trickier than expected.

#130 Yossi Kreinin on 06.14.14 at 1:13 pm

I can't wait to publish a discussion of your code with a call to arms to do this in other languages, dependent types or not…

#131 Mark on 06.15.14 at 11:26 am

All right, but I fear that by the time I get my code to work, I will have become convinced that you were right all along. :)

#132 Mark on 06.23.14 at 9:54 pm

Sorry for long silence. I think dynamic pivot is not too hard anymore, but too busy to implement now, overdue stuff going on…

#133 Mark on 06.25.14 at 9:55 pm

Mission accomplished! It turns out that dynamic pivoting does not add much to the complexity or code size. Here are the file size break downs (note that all of these check the matrix size at compile time):

No unit checking, yes dynamic pivots: 3.19KB (There is a link to that code higher up in this thread).

Yes unit checking, no dynamic pivots: 14.6KB (I have no plans to post this publicly unless someone asks.) This is about 4.6 times larger than no unit checking. C++11 would make this a lot smaller, (and C++14 would be even better) but there is quite a bit of blatant redundancy left over even after applying the new features in the obvious ways.

Lastly, yes unit checking, yes dynamic pivots: 15.8KB. (I plan to post a link to this in a few minutes.) This is almost 5x larger than no unit checking, but only 8% larger than unit checking without dynamic pivots.

Thus, I conclude that my code supports Yossi's (can I call you that, or would you prefer something else? I've been carefully wording my posts for a while to avoid using your name,) conjecture that row reduction is hard for a type system like that of C++, but not the conjecture that this is because of dynamic pivots.

#134 Mark on 06.25.14 at 10:19 pm

Yet another attempt at posting a link:

#135 Mark on 06.25.14 at 10:20 pm

Did it work this time?

#136 Mark on 06.25.14 at 10:20 pm


#137 Mark on 06.26.14 at 12:24 pm

Actually if you're willing to #include something like boost::fusion, or implement some of it yourself, the code can become far more compact.

#138 Yossi Kreinin on 06.28.14 at 10:30 am

Well… tell me when you're think you're done/if you'd like me to publish a follow-up…

#139 Mark on 06.29.14 at 2:25 am

Oops, turns out my code doesn't compile with g++. Will try to fix. Aside from that, I guess I'm done, though eventually I'd like to upgrade the whole thing to C++11 or even C++14, because with C++98 this is just not practical.

#140 Mark on 06.29.14 at 9:20 am

Ok, I have no idea why this fails to compile on g++. Can somebody test on a different version of g++ on the off chance that it's some bug that has since been fixed?

#141 Mark on 06.30.14 at 8:39 am

This is completely off topic, but what happened to the link to your other blog on your main page? (I mostly ignored it, until I noticed now it was gone.)

#142 Yossi Kreinin on 06.30.14 at 9:54 pm

I merged this blog with the other one as a part of the forced upgrade to WP 3.9 (DreamHost will soon stop supporting the version of PHP needed to run WP 2.5 or whatever it was that I previously used.)

#143 Mark on 07.01.14 at 12:50 am

Oh, ok. On a more relevant note, I realized that you might consider my solution as cheating. Although I do use dynamic pivots, this is not traditional Guass-Jordan elimination. In particular, even before I added dynamic pivots, during the conversion to unit checking, I changed it to a two-pass algorithm. Now that it does dynamic pivots, it does all the row swapping at the end.

#144 Levi on 08.06.14 at 6:24 am

Sorry I dropped that last link and disappeared. But I ran across your blog again and was reminded of this, and I did a bit of poking around again. It turns out that the mathematician George Hart has done the analysis of how Linear Algebra works out when you assign dimension types to all values:

#145 Mark on 08.19.14 at 7:34 am

Levi: no doubt this is very interesting from a math perspective. Indeed, I made a couple of cute math discoveries in the process of implementing this thing. (No doubt George Hart and many others already knew those things long before me, but still, I was pleased.) However, math usually ignores implementation details. for instance, many math textbooks will give you an algorithm for multiplying two n by n matrices in n^3 time, but won't mention that there are faster ways. Here, the question is even vaguer, to the point that even most books on algorithms will not go there. Namely: can you implement this cleanly in some language so that you can discover at compile time whether you made some mistakes with units?

#146 Benjohn on 04.30.15 at 2:56 pm

The measurements are made in a coordinate system. Much like x,y coordinates only make sense once you define the basis they are coordinates in, your measurements only "mean" something once you know the basis they are measurements in. Somehow they must "carry" their basis about.

The basis could either be part of their type, or part of their value.

If the basis is part of the value, then there is a single type and any operation can choose the basis that the result is in – it could be the basis in use by any argument, or some special privileged "universal" basis, or a basis picked at run time to best hold the result (maybe for accuracy reasons).

If the basis is part of the type, then operations on values in different basis are still valid, but they must choose the type of the result. A couple of options are to choose the type (basis) of an arguments, or to always use a special privileged type (basis).

Thanks for getting me to think about this. I've thought about it in the past, and it seemed very complex. My suggestion above seems pretty simple though – I'm probably missing something!

#147 Benjohn on 04.30.15 at 3:13 pm

… :-) So, yes, I was at least a few things.

#148 doug ybarbo on 01.01.16 at 11:14 am

nice one. If you have continued to think about this problem more of the past 18 months then i hope you'll continue to write about it.

at least if i understand your Post correctly, i'm troubled by the same issues–eg, i code in Scala and it seems odd to me that no one is bothered by the fact that when it comes time to do something computationally intensive, gradient descent or whatever, we look through the large Scala collection library full of gorgeous, immutable data structures–and we find that none of them work for the sausage-making of iterative matrix computation, so i drag in Apache Commons Math to do the real work. A quote from an ArsTechnica article in may 2014: "When you have, for example, a huge array—one that barely fits in memory—you can’t really afford to make any copies of it or represent it as anything besides a big contiguous chunk of mutable memory. For really hard computational problems, it is often the case that the only known tractable algorithms make extensive, essential use of mutable state. In a sense, Clojure and Haskell are tackling some of the hardest problems in computer science while Julia is aimed at the hardest problems in computational science. Someday, we may know how to deal with both at the same time, but for now, I think we have to choose our battles."

#149 Yossi Kreinin on 01.01.16 at 12:07 pm

@doug: actually you're describing a different part of the issue – the extent to which code expressed in terms of immutable values can be optimized to allocate less memory and mutate it when needed at runtime. I was mostly talking about the extent to which information known about your inputs in compile time can be propagated through computations (here the consensus is that it's relatively easy but it's rooted in intuition gained when saying that this here's an int and this here's a double and this here's a list of Point, and I claim that it breaks down when you want to add more ambitious annotations like units of measurement; that, in essence, using plain "double" is a form of dynamic typing because the double could carry very different units, and that this dynamic typing is essential to getting things done because expressing, in compile time, what the actual units stored in a double really are is between very hard and impossible even in relatively simple examples.)

#150 alexh on 08.10.17 at 11:35 am

OK, people were still making comments 2 years after the original post, so I'm going to plunge in and see if anyone is still watching in 2017 :-)

Some of the solutions above are impressively ingenious, but they're based on a flawed assumption.

In your equation (x1 1)*(A B) = y1, if y1 and x1 are in metres, then the number 1 is also 1 metre: it is not a dimensionless constant! A and B are both dimensionless. So in the matrices X'*X and X'*Y, all entries have type metres^2. No need for mixed types within the same matrix at all. This is the basis of the elegant solution you're looking for.

#151 Tudor on 11.06.17 at 4:27 pm

Found this article again from an unrelated discussion, happy to see there is still some activity.

I've actually found a research paper that aims on solving this problem: "Type inference for array programming with dimensioned vector spaces" (P. R. Griffioen ). They also have a freely available PDF, not posting the link to avoid moderation.

Leave a Comment