http://msdn.microsoft.com/en-us/library/dd233243.aspx.
Consider F#. It allows compile time checking of units. I assume that
meets your definition of static?
OK; most static type system allow compile time checking of units in
some form or other. I'm specifically asking what's the type signature of
or a similar function and then what are the types of intermediate
variables, of a determinant function etc. Because I dare suspect that
this case is not handled by any type system out there. Or if it is
handled I'd like to see how.
I understand your point, but I think you're trying to cram too much
into a type system. The way I'd approach this is to embed a symbolic
solver in the program and use that, e.g. Maxima
I don't think a symbolic solution for a 10Γ10 system is a practical
approach. And I'm not trying to cram too much into a type system but
rather trying to check if my impression that static typing breaks down
very quickly is correct or not. The average discussion about typing
talks about collections and covariance/contravariance and Employee from
which Manager is derived and so on; I thought it's interesting to see
what happens with basic math which everyone agrees makes sense and is
practically important.
I think what you're looking for is a "dependent type system", like
what ATS or Coq offers. I can't be sure, though, because I can't make
head or tail of the introductions I've found to these systems; on the
other hand, I'm pretty sure their type systems are Turing-complete and
therefore undecidable.
C++'s type system is Turning complete and undecidable I think. The
question is how the type of would look like, and whether an iterative
method like Gauss elimination could work.
Maybe what I should do is look if Gauss elimination can work that way
at all and if it can't publish the fact under a trollish title like
"static typing can't do 5th grade math" to make the point a bit
stronger.
I mean I know there's a ton of complicated type systems out there,
what I don't know is what happens with this case.
Dependent type systems like those featured in Coq are Agda are type
systems which, in effect, allow types to depend on values: an extremely
simple example is a vector parameterized by its length. In this case, a
dependent type system is exactly what you want, because it would allow
you to express the type of a matrix whose values have a type which
depends on their location in the matrix.
What would it look like? Well, let's assume we have a type
LessThan (limit : Nat)
which can express all natural
numbers less than n
for some n
. For example,
the possible values of the type LessThan 3
are 0, 1, and 2.
(This is straightforward to express in Coq and the like.) The type of a
matrix might then be
data Matrix : (height : Nat) -> (width : Nat) ->
(LessThan height -> LessThan width -> Type) -> Type
What this is saying is that a Matrix
is a type
parameterized by a width and a height, both expressed as natural
numbers, and a mapping from pairs of numbers to typesβspecifically,
pairs of numbers that are less than the dimensions of the matrix itself.
We then create a type-level function prod which takes two numeric types
and returns the type of the product of their values, so e.g. prod meters
meters == metersSquares for some definition of those types. That means
the type of the operator in question will look something like
backslash : Matrix n n f -> Matrix n m g -> Matrix n m h
where h y x = prod (f x y) (g y x)
The end result will be a matrix whose values are typed according to
their position, where those types have been computed as a function of
the types in the two matrices given as arguments. I'm not going to say
that it's a particularly straightforward thing to do, but it's certainly
expressible in the static type systems of these languages.
Incidentally, Kragen is half-rightβif types can depend on values, and
values can never get returned because of, say, an infinite loop, then
type-checking is necessarily undecidable. Most dependently typed
languages get around this by making it impossible to write programs that
loop forever (e.g. by enforcing that a function can only recurse on
ever-decreasing subsets of its argument) making Coq and the like not
actually Turing-complete. That hasn't stopped useful software (e.g. a C
compiler) from being written in them.
Thanks; I think this barely scratches the surface of the issue. I
could define a matrix with element type depending on its position in
C++'s hideous type system easily enough. One question is how to write
the code of the function... How do I invert this bloody matrix?
The problem here is that all type systems I know only only have the
notion of a scalar(and maybe typed lists for recursive functions). Even
a Vector is a scalar from the point of view of the C++ type system. It
would be interesting to have a type system that has tensors as built-in
concepts but we're not there yet and trying to build it on top of what
is already here doesn't seem very easy.
I think that it's worse than that β specifically iterative methods
assume that the various equations are basically of the same type as they
add/subtract them from each other, etc. So even if you can type every
element correctly elegantly enough the whole idea of units as types
breaks down.
I should look into this and republish it I guess...
In general, each row and each column of a matrix would need its own
type (its own units). So the type of a matrix would be described by two
arrays of types, one array for the rows and one array for the columns;
the type of each matrix entry would be the product of the type for that
row and the type for that column. (Or perhaps it would be more
convenient if it were the one divided by the other, so that in a
matrix-vector multiplication, one array of types would be the same as
the types of the input vector and the other array would be the same as
the types of the output vector.) Two-dimensional arrays could of course
be imagined where each entry had a completely different type, but such
arrays would not be matrices that could be multiplied by a vector or by
another matrix, since that would result in units mismatches. Of course
the simplest case would be for all the entries in the matrix to have the
same units, and that's likely also the commonest case. Your example
wants more than that, though.
In practice, the tool of choice for the problem you describe
(least-squares solution of overdetermined systems) is the singular value
decomposition, or the somewhat cheaper QR decomposition, since
explicitly forming the matrix X'*X and then inverting it loses about
half your precision. Which raises the question: in an SVD routine, what
should the types of the output matrices be? The routine takes a matrix
A, and outputs matrices U, S, and V, such that U and V are orthogonal, S
is diagonal, and A=U*S*V'. One way would be for U's rows to have the row
types of A, for V's rows to have the column types of A, and for the rest
to be unitless (not outside the unit type system, just having units
equal to one). I'm guessing that would work just fine, but any number of
other solutions would also be self-consistent, and it's conceivable
someone would need one of them.
Specifying that the type of a matrix consists of two arrays of types
is something for which syntax could get ugly fast. In the case of your
example, with a matrix whose columns had type 1, meters, meters^2,
meters^3, and so forth, there'd have to be some special notation to
indicate that the n'th column had type meters^n. With a fixed number of
columns, you could just write out the type array, but with a variable
number, you'd need a system where you could put in a formula for the
type. And in general that formula might be arbitrarily complicated. The
type checker would then be in the business of comparing two such
formulas for equality β and for that, it'd have to be able to solve the
halting problem, which of course can't be done. (Well, not by the
compiler, as in static type checking; it would be straightforward to
check at runtime.)
However, such code wouldn't be a good thing to write in the first
place. Polynomial approximation is fine for degree-two or degree-three
polynomials, but gets ill-conditioned fast as the degree increases. So
writing this code for arbitrary n isn't a good idea in the first place.
What does make sense in this scenario is to do the least-squares
approximation using orthogonal polynomials, such as Chebyshev or
Legendre polynomials. Analytically it's equivalent, but numerically it
actually works (rather than blowing up). In that case the entry in the
n'th column would not be x^n but rather a[0] + a[1]*x + a[2]*x^2 + ... +
a[n]*x^n, where the a[i]s are the coefficients of the polynomial. Now
each a[i] is going to need a distinct type, and those types can be
chosen such that the result is unitless. So instead of the matrix X
having complicated units, it can just be unitless.
But we can also do this in your original case! Just multiply the n'th
column by a quantity a[n] which numerically is equal to one, but whose
units are the inverse of its units. Then you get a unitless matrix, and
the rest of the problem is easy. Of course that kind of guts the type
checking, raising the question of whether it's any better than simply
ignoring the issue of units inside this routine.
I'm afraid it isn't. There are two possibilities here. One is that we
specify a[n]'s units by saying that they are the inverse of whatever
that the n'th column's units are. That's about like ditching the type
checking: if we happen to screw up those units (say, making them inches
rather than meters), the type checking will still check okay.
The other way is to specify r_n's units by saying that they are
meters^(-n). That gives us type checking: inches will still fail (or
maybe get converted automagically into meters). But have we evaded the
horns of the halting problem? No, we have not. Again, if code is going
to be written for arbitrary n, the type checker has to verify that the
formula meters^(-n) multiplied by the n'th column's units (meters^n) is
equal to one. Of course in that case it is not hard to check, but again,
in general that formula might be arbitrarily complicated β so there is
no general way of checking it at compile time.
So I think we're left with three cases that can work. One is matrices
of arbitrary dimensions where every entry in the matrix has the same
units. Another is matrices of fixed dimensions where the rows and
columns have specified units. The third is runtime units checking (which
might be optimized away in simple cases such as this one).
Just to make sure: you want
A, B = solve(X, Y)
and the problem is that while implementing "solve", you want the
language to properly check units. Yes?
If I understand what you want correctly, then this should indeed
happen "naturally". Take your any "flesh-and-blood" programming
language, even your very favorite C++ (I'm teasing). Implement Gaussian
elimination in the "natural" way. Things should Just Work (TM).
Re: the Mars Orbiter, did you intentionally pick the example that
Bjarne Strousrup uses to plug C++ templates to be cheeky? :)
(skip ahead to 21:00)
http://channel9.msdn.com/Events/GoingNative/GoingNative-2012/Keynote-Bjarne-Stroustrup-Cpp11-Style
To expand on my previous comment: Gaussian elimination is a procedure
that respects units, just like Cramer's rule. Why would you expect
problems from one but not the other?
@Mark: "respects units" alright β it'll work with 2 equations each
defined by a point. It will not work with X'*X because different rows
have different units and you'll be adding rows. And you'd have a heart
attack if you saw how far down this path I once went in my favorite
C++.
@Andrew: actually I didn't know he uses this example to plug C++
templates, though I'm not surprised he does.
@Norman: I think the simplest question that I should have focused on
and maybe will in a follow-up is that Gaussian elimination violates unit
correctness if applied to X'*X, while it does work for sufficiently
small dimensions and it's faster than SVD. (Even if it were so
numerically unstable so as to have no practical uses at all, it doesn't
really refute the point about typing because the method is algebraically
correct and so we're entitled to the type system's support as much as in
any other mathematically sensible method which should type-check just
fine if type systems live to their promise.)
In Gaussian elimination, you're not purely and simply adding rows;
you're multiplying one row by a scale factor and adding it to another.
So the scale factor gets its own units, which makes the row addition
pass the units check.
That much is true and it did cross my mind; but... gosh-darn. I mean,
I see how I could make it work to get the compiler to shut up, sort of,
but say when you're doing a Gaussian elimination for X of the right
dimension β a simple equation system β then the scale factor is a plain
number and if it's X'*X it has different units in each row. A generic
Gaussian elimination function starts looking a bit mind-numbing.
And... what about a Gauss-Jordan elimination with pivoting where you
pick the pivot dynamically? Does this finally force us into a territory
where you ought to keep track of types at run time?
Problem is, you asking wrong question.
Can you add meter to inches? Yes, you can! Just take a piece of rope
of X meters long, tie to another rope Y inches long and you'll have an
answer.
Thing is, meters and inches aren't types, they are coefficients.
The problem rather stems from the fact that multiplying meters is not
typical math multiplication R*R->R but rather R*R->R^2 (square
things).
So the reason you can't sum square meter to meters is because they
are totally different things.
Just like dividing X meters by Y seconds is not the same as doing X/Y
while in practice it gives a right answer.
You can do it right in C++ by overloading math operators and manually
creating types and operators for needed values.
Seems like what you are asking for is automatic creation of these
types and operators for them.
Not sure if that exists.
Erm... well they are coefficients and if you make them symbolic
coefficients and do symbolic math, things might work just fine. And yes
they probably aren't types. But why is it I who am asking the wrong
question, instead of whoever giving types as the answer giving the wrong
answer?..
And as to adding totally different things β doesn't Gauss-Jordan ran
on X'*X add (scaled) rows containing "different things" ending up with
the right result?
"...doesn't Gauss-Jordan ran on X'*X add (scaled) rows containing
"different things" ending up with the right result?"
No! It would be a very strange algorithm if it did so. After scaling,
the rows contain the same things.
> And as to adding totally different things β doesn't Gauss-Jordan
ran on X'*X add (scaled) rows containing "different things" ending up
with the right result?
I am not on a good terms with math, but... is it? Not sure about
Gauss-Jordan, but Gauss elimination works ok:
Units of X'*X is (as you said):
a b // m^2 m
c d // m n
c'=0=Pa+c, so P=-c/a with units being m/m^2 = 1/m
Resolving units:
c'=Pa+c // 1/m * m^2 + m = m + m
d'=Pb+d // 1/m * m + n = n + n
so new matrix ends up being:
a b // m^2 m
0 d' // m n
@Yossi K. Actually, it seems that I must admit you are at least
partially right. After thinking for a bit, I realized that static typing
and row reduction (as in Gaussian elimination) are not as compatible as
I thought. Informally, row reduction tries to turn a square matrix into
the identity matrix, or as close as it can get. The identity matrix of
course is dimensionless, but the matrix you start with is not. The most
natural way to implement row reduction in C++ (and most languages) is to
do it "in place" (that is, modify the original matrix). But if the type
system is checking units, that means that individual matrix are change
types as you go! Thus, static typing fails here. The basic problem is
that static typing has no issues with code such as:
a = b * c;
d = e / f;
It has big problems with code like this:
x *= y;
w /= z;
Because unless y and z are dimensionless, it requires changing the types
of x and w at runtime. However, all is not lost. It is impossible to
implement in-place row-reduction, but it can still be done by making a
fresh result matrix. If implemented carefully, this should not affect
the asymptotic efficiency, and only doubles the memory requirements.
Doubling the memory is usually considered a bad thing, but in your
example, the matrix is only four elements, so who cares?
@Mark: I keep claiming I'm wholly right :) Specifically β
the problem is that you pick a "pivot row" dynamically depending on the
runtime values and work on that. How do you do this with static typing β
in-place or not?
@Ivan: Gaussian elimination does work β if spelled out for the given
dimension/problem, and perhaps through some unholy contortion if
specified as a generic function with hideous static types. Gauss-Jordan
however picks a pivot row dynamically which to me seems to preclude a
static type system-based unit checking or at least most sane approaches
to it.
> Gauss-Jordan however picks a pivot row dynamically
Ohh, i see now. Problem is not that you add incompatible values (you
probably aren't).
Problem is that swapping rows changes signature of matrix in a way
that depends on runtime value of matrix.
Looks like a problem of getting nth element from heterogeneous
collection all over again.
Can it be solved by computing all possible output types? Looks like
it should converge to a single result type in a Gauss-Jordan case.
@Yossi K. Ok, I think I finally begin to see what you are saying. You
start with a toy example (linear regression). Then you say that you can
solve it by inverting a matrix. And even though inverting this
particular matrix is doable (and easy) with a static type system,
inverting an arbitrary matrix is hard (or impossible). Did I summarize
your viewpoint correctly?
With Gaussian elimination, even if you don't do pivoting, your scale
factor, which in today's routines is just a single variable declared
once, will have to take on different units for each combination of two
rows: its units will have to be declared as row A's units divided by row
B's units. That's not inherently a big deal: instead of putting the
declaration at the head of the routine, you can put it in the inner loop
β so even though you're declaring it O(n^2) times, there is no waste;
the same storage can be used. The compiler can look at the declaration
and see that the result of multiplying (B's units) by (A's units / B's
units) is just (A's units).
Pivoting doesn't really change this. For one thing, you don't
actually have to swap rows; you can just keep track of which rows have
been used so far. (Or, for that matter, which rows and columns have been
used, if you're doing full pivoting.) I believe this is actually the
approach that is used in LINPACK and LAPACK, although I am not certain
of that.
The question is whether this sort of symbolic checking can be done in
sufficient generality without running into the halting problem.
@Mark: even this matrix or another smallish matrix β say 5Γ5 β gets
gnarly if you do Gauss-Jordan because you swap rows dynamically.
@Norman: I need to look closer at this row-swapping business to see
how the bookkeeping would look like in a static type system.
Like Norman said, there's usually no need to actually swap rows. I'm
getting tempted to actually implement this, even if for no reason other
than to prove to you that it would NOT get gnarly.
Go ahead :)
Fine. Let's agree on what pieces it should include and how general it
should be, for it to count as valid evidence. Obviously, as a minimum,
it should be able to handle your linear regression with meters via row
reduction example. Anything else?
Also, we should agree on "not too gnarly". How about less than 100
lines of source, no fancy tricks allowed? And too make sure I don't use
too many long lines, less than 10KB total source code.
What it should do: a general-purpose Gauss-Jordan elimination routine
that one could run on:
* "just a bunch of numbers" β no fancy static types
* an equation set where every column has a different type (with no clear
relations between the types except that the right hand side can be
reached by multiplying the solution vector by the left hand side)
* an X'*X equation set where X is the previous matrix
Now maybe that makes no sense; if so let's discuss how the columns'
types should be related.
As to "gnarly" β let's start with the user being off the hook β being
able to call the function with all these arguments. As to the prettyness
of the code itself β it's in the eye of the beholder; let's say it
should be 2X longer than the plainly typed version, tops?
Ok, so basically, you want a procedure something like this:
answer = invert(question)
where invert is implemented using Guassian elimination, along the lines
described in a textbook or Wikipedia article, and further, the answer
matrix is correctly typed relative to the question matrix. This should
not be more than twice the length of code that just works for doubles.
If I paraphrased you correctly, then I'm pretty sure I can meet
requirements 1 and 3, and hopefully a weak version of 2. The weakness is
that instead of the types being completely arbitrary and opaque, I
insist they are all derived from some common unit system (like say, all
types are some power of length times some power of time).
OK β not completely arbitrary types; but β Gauss-Jordan with
pivoting.
OK. This will take at least few days. I presume I can't hardcode the
number 2, right?
Wait a sec... you said you want to pick a pivot dynamically, instead
of just going through the rows one at a time? What strategy would you
use? I mean, you clearly want to avoid zero, but that still lots of
choices. Or is it OK to pick any non-zero element?
I don't remember what Gauss-Jordan elimination does exactly; I could
look at the Numerical Recipes book at work next week... There's code in
there picking the pivot.
The larger the pivot, the better. But that's 'larger' as compared to
the rest of the row, not 'larger' in absolute terms. Taking a quick peek
at Numerical Recipes, they say that a good way to do this is to, before
starting the whole procedure, scale each row so that the largest element
has size one. (You then have to keep track of the scale factors, of
course.)
By the way, I figure you guys are talking about partial pivoting,
since that's the usual way it's done, but there's also full pivoting,
where you swap columns as well as rows, to get an even larger pivot. In
that case you can just pick the largest element in absolute terms, and
not do any pre-scaling.
Speaking of which, supposing you do this scaling of each row. What
should that do to the units? Say a row's units were meters, to start
with. Are they still meters? Well, from one point of view, sure: they
just refer to a scaled-up or scaled-down version of the problem. But
from another point of view, no. Say the scale factor just coincidentally
happened to be 0.0254; then these are actually inches, not meters. Or
from a third point of view, one can decide that the scaled matrix now
just has plain unitless numbers and that all the units reside in the
scale factor. That'd be the easiest way for Mark to implement this:
after the initial scaling, he can just do the usual Gauss-Jordan
elimination without units entering into the mechanics of it.
Isn't it a bit like deciding that the entire solver operates on plain
numbers and "trust me, it works out"?
It is, but it has the added benefit of being compliant with the
demand that everything be units-checked.
This is part of the deal, with mathematical abstraction: the math
works no matter what the numbers represent. When you're in the middle of
an abstract mathematical routine, why _should_ you be worrying about
units? Whether you're dealing with meters or with tangerines is of no
concern. Save that for parts of the code where units do matter. The fact
that you can easily forget about units in the routine, by stuffing them
away into a scale factor, is a feature, not a bug. This way the solver
is no harder to program than it would be under the "trust me, it works
out" scheme, but at the same time it can be part of a rigorously
units-checked program. For instance in the case you started with, the
fact that "meters" were the units of the X matrix wouldn't just be lost,
but would be filed away, so that if anyone who used the resulting
least-squares approximation tried to feed it data that were in inches
instead, it would yield an error.
I'm not sure I follow you β perhaps an example could help me; what I
hear β and I suspect that I mishear β is "the math works no matter what
the numbers represent" so just forget the units β but what if I have a
bug in the math and why are difficulties in getting the compiler to
type-check this kind of math (Gauss-Jordan) and having to forget about
unit checking there not representative of a general problem β of compile
time checking being much more limited than it seems from simple
examples?
BTW I've been browsing the email archives at your site for the last
few days... Good stuff.
Oh, I agree that the difficulty with units checking here is
representative of the general problem of it not being easy to do this
stuff at compile time. For that matter, I neglected to mention a cost:
if you're doing Gauss-Jordan and filing away the units in a vector of
scale factors, that means you can't, in a static type system, use the
same storage for the matrix: you're changing the types, so you have to
allocate a different matrix, typed in different units. (One could
conceive of compilers that would optimize this allocation away, but
really, forget about it, unless it's part of a benchmark they're trying
to game.) For that matter, even if you do as we were discussing earlier,
and put units into the actual computation, so as not to have to allocate
another matrix, you still can't play the trick that a lot of Gaussian
elimination routines do of storing the two matrices that are the results
of the routine, one upper diagonal and one lower diagonal, in the same
storage as was used for the input matrix β because in general the two
output matrices will have different units from each other.
As for "what if I have a bug in the math", you'll debug it in the
usual way. For routines like Gauss-Jordan, the task of the routine is
well-defined, so it's straightforward to write thorough tests for it.
Sure, dimensional analysis can help in catching bugs, even those that
are not simply units errors β but on the other hand, the bug will be
more visible without a lot of units annotation cluttering the code.
In practical terms, rather than do any debugging, you'll probably use
some Gauss-Jordan routine that someone else has written and thoroughly
debugged, so you'll want to just annotate it for units, as in "compiler,
please treat this as a black box, for which if the input units are U,
the output units are such-and-such function of U".
There are certainly a lot of places where units checking helps. I
just doubt that core math code is one of them. If you look in a textbook
explaining Gauss-Jordan, it's not going to mention units. They just
don't help at all in an intellectual understanding of the problem. On
the other hand, a textbook dealing with Maxwell's equations will
certainly mention units, as they are a big part of what is going on.
Still, if you're writing electromagnetics code from scratch, most of the
troubles with units can be avoided by imposing a strict rule that all
variables should be in a single self-consistent units system (such as
MKS); if you have that kind of rule, it can be a waste of time to add
annotations everywhere that say you're following the rule. It's when you
have a big sprawling mess of code, some parts of which use different
units from other parts, that units annotations become quite desirable,
so that you don't take (say) a length variable from one part of the
program, in inches, and feed it into another part which expects
meters.
I think that we sort of agree on compile time unit checking being
potentially useful in some cases and either hard/impossible or
counter-productive in others. Basically checking is nice but you ought
to be able to cast everything to double on various occasions.
I just realized that the first person to comment on this article is
also named Mark. So, just to clarify: Mark the F# defender is not the
same as Mark the C++ defender. (I'm the second Mark.)
OK :)
Ok, here's my version 0.0. It does not handle units at all, just
plain old doubles. I'm posting it here to make sure this is the kind of
thing you want to convert into a unit-checked version. Is it general
enough? Overkill? Missing pieces, extra pieces, and so on? It's a bit
over a hundred lines in length, and just over 3KB in source code size,
and at the very least solves the linear regression problem, but has a
subroutine that can row-reduce an arbitrary matrix whose size is known
at compile time. Also, the exercise of writing this has made me fear
that you are correct: getting this to work well with units is very
likely to get gnarly. After all, I already have 3 template matrix
classes, and the code just works with doubles! But I claim it can be
done!
ok, posting system is not letting me post my code.
never mind, I see it's awaiting moderation. But all the less than
signs get messed up. So don't approve the last one, I'll fix it
first.
template<int cols>
class Matrix {
typedef double Row[cols];
Row*const mat;
const int r;
Matrix(const Matrix&); // need a copy constructor, but I'm
cheating
public:
int rows() const { return r; }
Matrix(int rows):mat(new Row[rows]), r(rows) {}
double& operator()(int row, int col) {
if(row < 0 || row >= r || col <0 || col >= cols) throw
"bug!";
return mat[row][col];
}
};
template<int rows>
struct TransposedMatrix {
int cols() const { return mat.rows(); }
TransposedMatrix(Matrix<rows>& m) : mat(m) {}
double operator()(int row, int col) { return mat(col,row); }
private:
Matrix<rows>& mat;
};
template<int n>
TransposedMatrix<n> transpose(Matrix<n>& m) { return m;
}
template<int rows, int cols>
struct SmallMatrix {
double mat[rows][cols];
double& operator()(int row, int col) {
if(row < 0 || row >= rows || col <0 || col >= cols) throw
"bug!";
return mat[row][col];
}
SmallMatrix(){
for(int i=0; i<rows; i++)
for(int j=0; j<cols; j++)
mat[i][j] = 0;
}
};
template<int rows, int cols>
SmallMatrix<rows,cols> operator*(TransposedMatrix<rows>&
a, Matrix<cols>& b) {
if(a.cols() != b.rows()) throw "bug!";
return matmul<rows,cols>(a, b, a.cols());
}
template<int rows, int cols, int n>
SmallMatrix<rows,cols> operator*(SmallMatrix<rows, n>&
a, SmallMatrix<n, cols>& b) {
return matmul<rows,cols>(a, b, n);
}
template<int rows, int cols, typename M1, typename M2>
SmallMatrix<rows,cols> matmul(M1& a, M2& b, int n){
SmallMatrix<rows, cols> c;
for(int i=0; i<rows; i++)
for(int j=0; j<cols; j++)
for(int k=0; k<n; k++)
c(i,j) += a(i,k) * b(k,j);
return c;
}
template<int rows, int cols> // Gaussian elemination alg
simplified from Wikipedia, with different variable names
void row_reduce(SmallMatrix<rows, cols>& mat) {
int n = rows < cols ? rows : cols;
for(int i=0; i<n; i++) {
double scale = mat(i,i);
for(int j=i; j<cols; j++)
mat(i,j) /= scale;
for(int j=0; j<rows; j++) {
if(j==i) continue;
double factor = mat(j,i);
for(int k=i; k<cols; k++)
mat(j,k) -= mat(i,k) * factor;
}
}
}
template<int n>
SmallMatrix<n, n> inverse(SmallMatrix<n, n> mat) {
SmallMatrix<n, 2*n> temp;
for(int i=0; i<n; i++) // glue on the identity matrix
for(int j=0; j<n; j++)
temp(i,j) = mat(i,j),
temp(i,j+n) = i == j;
row_reduce(temp);
SmallMatrix<n,n> ans;
for(int i=0; i<n; i++)
for(int j=0; j<n; j++)
ans(i,j) = temp(i,j+n);
return ans;
}
struct Line { double slope, intercept; };
Line least_squares_fit(const double x[], const double y[], int n)
{
Matrix<1> Y(n);
for(int i=0; i<n; i++) Y(i,0) = y[i];
Matrix<2> X(n);
for(int i=0; i<n; i++)
X(i,0) = x[i],
X(i,1) = 1;
TransposedMatrix<2> Xt = transpose(X);
SmallMatrix<2,1> ans = inverse(Xt*X)*(Xt*Y);
Line ret;
ret.slope = ans(0,0);
ret.intercept = ans(1,0);
return ret;
}
#include<cstdio>
int main() {
double x[] = {1,2,3}, y[] = {7,8,9}; // input
Line line = least_squares_fit(x, y, 3);
printf("y=%gx+%gn", line.slope, line.intercept);
}
Ok, I fixed it.
Sorry for the comment system being what it is... I think maybe
<pre> will help:
class IndentLevel0 {
int indentLevel1() {
return indentLevel2();
}
};
As to the code β I think it's Gaussian elimination rather than
Gauss-Jordan, am I wrong? The latter picks a pivot row dynamically based
on the element values.
OK, pre does help. I think you still need to escape less-than and
greater-than (even inside pre) to make it look right. Without
escaping:
template class C;
With escaping:
template<class T> class C;
...Yeah, so the pre tag with manual escaping for HTML entities is
what works in this thing...
You never got back to me about how to pick a row dynamically. If the
goal is to make a point about type systems, any dynamic rule should
work, so how about I pick the first non-zero row?
Ok, now with dynamic pivots! Let's try an old-fashioned xmp tag!
Ok, now with pre tags!
template<int cols>
class Matrix {
typedef double Row[cols];
Row*const mat;
const int r;
Matrix(const Matrix&); // need copy constructor, but I'm
cheating
public:
int rows() const { return r; }
Matrix(int rows):mat(new Row[rows]), r(rows) {}
double& operator()(int row, int col) {
if(row < 0 || row >= r || col <0 || col >= cols) throw
"bug!";
return mat[row][col];
}
};
template<int rows>
struct TransposedMatrix {
int cols() const { return mat.rows(); }
TransposedMatrix(Matrix<rows>& m) : mat(m) {}
double operator()(int row, int col) { return mat(col,row); }
private:
Matrix<rows>& mat;
};
template<int n>
TransposedMatrix<n> transpose(Matrix<n>& m) { return m;
}
template<int rows, int cols>
struct SmallMatrix {
double mat[rows][cols];
double& operator()(int row, int col) {
if(row < 0 || row >= rows || col <0 || col >= cols) throw
"bug!";
return mat[row][col];
}
SmallMatrix(){
for(int i=0; i<rows; i++)
for(int j=0; j<cols; j++)
mat[i][j] = 0;
}
};
template<int rows, int cols>
SmallMatrix<rows,cols> operator*(TransposedMatrix<rows>&
a, Matrix<cols>& b) {
if(a.cols() != b.rows()) throw "bug!";
return matmul<rows,cols>(a, b, a.cols());
}
template<int rows, int cols, int n>
SmallMatrix<rows,cols> operator*(SmallMatrix<rows, n>&
a, SmallMatrix<n, cols>& b) {
return matmul<rows,cols>(a, b, n);
}
template<int rows, int cols, typename M1, typename M2>
SmallMatrix<rows,cols> matmul(M1& a, M2& b, int n){
SmallMatrix<rows, cols> c;
for(int i=0; i<rows; i++)
for(int j=0; j<cols; j++)
for(int k=0; k<n; k++)
c(i,j) += a(i,k) * b(k,j);
return c;
}
template<int rows, int cols> // Gaussian elemination alg
simplified from Wikipedia, with different variable names
void row_reduce(SmallMatrix<rows, cols>& mat) {
int n = rows < cols ? rows : cols;
for(int i=0; i<n; i++) {
int p = i;
while(p<n && mat(p,i)==0) p++;
for(int j=0; j<cols; j++){
double temp = mat(i,j);
mat(i,j) = mat(p,j);
mat(p,j) = temp;
}
double scale = mat(i,i);
for(int j=i; j<cols; j++)
mat(i,j) /= scale;
for(int j=0; j<rows; j++) {
if(j==i) continue;
double factor = mat(j,i);
for(int k=i; k<cols; k++)
mat(j,k) -= mat(i,k) * factor;
}
}
}
template<int n>
SmallMatrix<n, n> inverse(SmallMatrix<n, n> mat) {
SmallMatrix<n, 2*n> temp;
for(int i=0; i<n; i++) // glue on the identity matrix
for(int j=0; j<n; j++)
temp(i,j) = mat(i,j),
temp(i,j+n) = i == j;
row_reduce(temp);
SmallMatrix<n,n> ans;
for(int i=0; i<n; i++)
for(int j=0; j<n; j++)
ans(i,j) = temp(i,j+n);
return ans;
}
struct Line { double slope, intercept; };
Line least_squares_fit(const double x[], const double y[], int n)
{
Matrix<1> Y(n);
for(int i=0; i<n; i++) Y(i,0) = y[i];
Matrix<2> X(n);
for(int i=0; i<n; i++)
X(i,0) = x[i],
X(i,1) = 1;
TransposedMatrix<2> Xt = transpose(X);
SmallMatrix<2,1> ans = inverse(Xt*X)*(Xt*Y);
Line ret;
ret.slope = ans(0,0);
ret.intercept = ans(1,0);
return ret;
}
#include<cstdio>
int main() {
double x[] = {1,2,3}, y[] = {7,8,9}; // input
Line line = least_squares_fit(x, y, 3);
printf("y=%gx+%gn", line.slope, line.intercept);
}
it's not showing up. odd
the pre tag got swallowed. can you fix it, and remove the
duplicates?
I can't edit the comments, at least not through any WordPress
interface that I see... perhaps with this much code, if you want it at
some publicly readable place on the Internet the best thing is github or
some such.
Regarding pivoting β I think it could be done as in http://en.wikipedia.org/wiki/Gaussian_elimination#Pseudocode
...though I hope to check what Numerical Recipes says as well.
Ok, let's see if I can figure out this pre tag:
zero
spaces
tab
one
Cross fingers!
OK, here's a github gist:
https://gist.github.com/anonymous/11153213
By the way, I'm not using the Wikipedia algorithm for pivot selection
because it makes no sense if each row can have different units. But in
any case it doesn't matter for our purpose. As long as the pivot is
selected dynamically, we know that the type system can handle a
different method.
The comments here drive me crazy sometimes. I never know if they went
through!
Yeah, I approved the link to the gist. Still before type checking,
right?
Yeah, the stuff in the gist only works with doubles.
I should have a type-checked version ready sometime next month. This
is proving somewhat challenging, and I haven't even gotten to the hard
part yet (the row reduction).
I'm going to post little updates here every week or two if no one
tells me to shut up. Progress so far: the memory usage should be only
40% of the unchecked version, because the X and Y matrices are no longer
create copies of the original array. On an unrelated note: I'm trying to
stay away from C++11, even though it would make some things here much
easier. One feature I am making heavy use of is the ability to use two
closing brackets in a row without a compiler error. If anyone doesn't
like that, feel free to use search and replace, which should be easy as
I'm not using any bit-shift operators. (And before you say cout: I'm
using printf.) So, far I have the function signature for matrix
multiplication working correctly, but the function body is not yet
implemented. I should have multiplication fully functional by next week.
We'll see.
Why not C++11? Even I upgraded to C++11...
That's true. You even praised it. I don't remember your exact words
anymore, but I think it was something along the lines of "it sucks
less".
Maybe I'll do both.
I still claim this will eventually get done, but I think you win
anyway. This is indeed getting really gnarly.
OK :) FYI, I talked about this to a guy who's both a big math geek
and a big type systems geek. We'll see what he comes up with :)
Awesome!
Small bug: you wrote that X'*Y looks like this:
sum(xi*yi) sum(yi) //m^2 m
Shouldn't it actually be the transpose of that, like so?
sum(xi*yi) //m^2
sum(yi) //m
Did my comment go through this time?
I have a problem. I've been going through the motions of implementing
matrix multiplication with unit checking, and even recently
congratulated myself when I realized that my latest compiler error was
not because of a buggy multiplication algorithm, but rather because I
was multiplying incompatible matrices. And finally, it was time to start
working on the main problem: inverting a matrix. Specifically, inverting
X'*X.
And then I realized something. I have no idea what units the inverse
should have. I don't mean my code can't tell, I mean I personally can
not tell. In fact, it's worse than that. I have constructed a "proof"
that the matrix can't possibly have an inverse. I won't bother posting
the proof here, because it just has to be wrong. But I can't figure out
why.
You know, I kinda feel bad about this :) I wrote about something
which ended up being a needless time sink for me many years ago. I
didn't intend it to become a needless time sink for someone else...
That said β perhaps solving the equations without explicitly
representing the inverse matrix would work better... Though the inverse
sure exists (often) and would need to have a type...
No need to feel bad. I usually find MUCH worse time sinks without any
help. :) This one makes me feel warm and fuzzy. ("Restoring Yossi's
faith in static typing while advancing the state of the art in C++
template abuse!")
But if you claim the inverse exists, what units does it have? You
have a pretty picture showing the units for X'*X. Can you make a similar
picture for its inverse?
Erm... I haven't tried, but can't it be worked out, for a 2Γ2 case,
from a formula computing every element?
Maybe, maybe not. When I plug in the units, I get nonsense. If you
don't feel like finding the formula yourself, try this:
http://www.wolframalpha.com/input/?i=inverse+of+%5Ba%2Cb%5D%2C%5Bd%2Cf%5D
The comment system on this blog drives me nuts! Attempt three: here
is a formula
http://www.wolframalpha.com/input/?i=inverse+of+%5Ba%2Cb%5D%2C%5Bd%2Cf%5D
Let X'X =
[a,b]
[c,d]
Then according to Wolfram Alpha, the inverse is
[d/(ad-bc), -b/(ad-bc)]
[-c/(ad-bc), a/(ad-bc)]
Now tell me the units :)
Oh, now I see they all went through.
It's simple: if there's a link, comments are held for moderation.
Why? So bloody spam bots don't get link juice from their bloody
comments... And so that readers don't have to sift through that
shite...
Well, ad-bc is m^2. So we get:
1/m^2 1/m
1/m n
Not?
Very good. Except that multiplying a matrix by its inverse should
yield the identity matrix, which is dimensionless. This seems to not
happen here.
Sorry, never mind I am wrong. I made an incorrect assumption. I
assumed that the identity matrix must always be completely
dimensionless. In general this is not true. The zeros of the identity
matrix can have a dimension, as long as the ones do not. The units you
gave are correct. Thank you for putting up with me.
Thinking about it as linear operators instead of matrices makes this
much easier. A linear operator is simply a function from one vector
space to another f : U ~> V. I'm using a ~> b here to denote
linear operators. The vectors can have different types. If U is a
position in space it will have type (meter,meter,meter). It can even
have different types for different components, like (newton, seconds,
meter).
Now the type of \ is obvious. It simply inverts a linear operator. So
it has type (U ~> V) -> (V ~> U).
@Jules: I think what YK really wants to know is not the type of the
"" operator. As far as I can tell, what he really doubts is the
possibility of actually implementing this operator, with unit checking,
in a static type system, using an algorithm of his choice. In
particular, unless I misunderstood him entirely, he claims that
Gauss-Jordan elimination will be near impossible to implement, because
picking the pivot dynamically is not compatible with static typing.
@Jules: sure, but what are V and U?.. Given all the options of how
the units can be typed? And what will be the types of intermediate
variables in a Gauss-Jordan elimination?
V and U are the types of vectors. I gave two examples
(meter,meter,meter) and (newtons, seconds, meter). Is this unclear?
Gaussian elimination does primitive row operations, each of which
affects the matrix type in a definite way:
1. Swap two rows, say rows 1 and 2. Given matrix M : (a,b,c) ~>
(d,e,f), the new matrix has type (b,a,c) ~> (d,e,f).
2. Multiply a row by a scalar, say row 1. Given matrix M : (a,b,c)
~> (d,e,f), and scalar x : g, the new matrix has type (a*g,b,c) ~>
(d,e,f).
3. Add one row to another, say row 1 to row 2. Given matrix M :
(a,a,c) ~> (d,e,f), the new matrix has the same type (a,a,c) ~>
(d,e,f). We have `a` twice since we can only add the first and second
row in matrices where the first and the second row have the same
type.
Of course, which row will be selected (and even how many rows there
are) depends on run-time data. So the types will depend on run-time
data. This means that you need dependent types to type check Gaussian
elimination (though the final resulting type is not a dependent
type).
One thing that is interesting to note, if you hadn't already noticed
it, is that an n by m matrix does not have n*m units that we can pick
freely, but only n+m. If you were to pick a different incompatible unit
for each cell in the matrix, it wouldn't be possible to multiply any
vector by it.
Sorry, I made a mistake. In all the preceding example the modified
type should be the output type not the input type. So:
1. Swap two rows, say rows 1 and 2. Given matrix M : (a,b,c) ~>
(d,e,f), the new matrix has type (a,b,c) ~> (e,d,f).
2. Multiply a row by a scalar, say row 1. Given matrix M : (a,b,c)
~> (d,e,f), and scalar x : g, the new matrix has type (a,b,c) ~>
(d*g,e,f).
3. Add one row to another, say row 1 to row 2. Given matrix M :
(a,b,c) ~> (d,d,f), the new matrix has the same type (a,b,c) ~>
(d,d,f). We have `d` twice since we can only add the first and second
row in matrices where the first and the second row have the same
type.
OK, so row-swapping changes the matrix type β then how, in an actual
programming language, would the code look like that handles a matrix of
dimension N? Does N have to be known at compile time β and if so, do
O(N) types emerge from compiling the code, with O(N) pieces of code
generated each handling a set of types?
I'm not saying there's no meaningful type for any of the intermediate
results in an abstract sense; I just ask how to practically write code
such that these types are actually checked.
That's why I said you need dependent types. The type of the swapRows
function would look something like this:
swapRows : (units : List Unit) -> i:int -> j:int -> (U ~>
Vec units) -> (U ~> Vec (swap i j units))
Where Vec listOfUnits is a type constructor for a vector with units,
and swap i j lst swaps the i-th and j-th elements of the list. This will
probably read like voodoo if you don't already know a dependently typed
language like Coq or Agda or Idris...
Does it work if carried to the end where you write the code and not
just the prototype of swapRows? I can write the prototype of swapRows in
C++ (though it won't be pretty and will read like something worse than
voodoo...). It's the code calling it to implement Gaussian elimination
that I'm not sure will come out practically usable.
You know what β since I don't know any of these languages β are i and
j arbitrary runtime values or do they ought to be compile time
constants? If they're runtime values that's plenty cool. And you're
saying you can just call swapRows in a loop with dynamic values of i and
j and get static unit checking? That is quite the bloody
voodoo, and that I cannot do in C++, and that's interesting enough to
delve deeper into this stuff.
I don't see any fundamental obstacle, but it's not a trivial
exercise. First you would need to implement a unit system for tracking
units of numbers. Then you would need to implement matrices, and
implement gaussian elimination with extra proof terms to make sure the
type checker understands that the units match up.
RE: second reply.
i and j are runtime values. That is indeed exactly what makes
dependent types cool. At type checking time, i and j stand for *any*
number that i,j could take on at runtime, and the type checker verifies
that the types are ok for *any* i,j. At runtime i,j will take on their
actual values.
It would be very interesting to see this at work β this particular
example and more generally, how smooth linear algebra can be with unit
checking mixed in given dependent types. As someone knowledgeable about
these things, do you see it becoming mainstream the way plain type
checking or parameterized types or OO-style polymorphism or
compile-time/run-time duck typing polymorphism became mainstream? (I
mean not in terms of what the chances are of people wanting this as much
as what the chances are that if they feel they do want this, they can
become fluent in the necessary techniques.)
So F# or Haskell can't do this β only Coq or Agda? What are some
other examples where you need dependent types?
Yes I think it will eventually become mainstream. Dependent types are
actually quite simple, although it is a big conceptual leap. One
obstacle to dependent types becoming mainstream is that currently you
need to prove all the things required by library functions. So if you
have a function that accepts only prime numbers, you will have to prove
that the number you're passing in is prime. This is often not easy and a
lot of work that may not be worth it. So I think on of the things that
needs to happen is a system for systematically inserting run-time checks
for things that have no programmer written proof, instead of just giving
a type error and refusing to run the program. Then it would become a
kind of hybrid between contracts and types: if you don't write the
proofs, things are dynamically checked as with contracts, if you write
the proofs, things are verified at compile time.
The thing that dependent types fundamentally allow you to do is write
functions where the output type depends on the value of the input. Take
for instance the following function:
f : Bool -> Int
f c = if c then 3 else 5
This is not a dependent type since the output type Int does not
depend on the input value c. However:
g : (c:Bool) -> if c then Int else String
g c = if c then 3 else "Hello World"
This is a dependent type because the output type depends on the value
of the input. You see that we can write expressions where you'd usually
have types: if b then Int else String. In effect in dependently typed
languages the type sublanguage (with entities such as Int,String,Bool)
is the same as the value sublanguage (with entities such as
3,"Hello",True). You can mix and match as you wish.
Since you know C++ we can compare it with templates:
- With templates, you have a very limited language to express
computations. With dependent types you have the full run-time language
available in the types.
- With templates, the values have to be known at compile time. With
dependent types, the values can come from run-time.
If you want to know more, Certified Programming with Dependent Types
is a tutorial about dependently typed programming in Coq. http://adam.chlipala.net/cpdt/
You say doing the unit-checked Gauss elimination is non-trivial. How
much more code than the "plain double" version, do you think? (If it's
way way more then such libraries will remain to today's widespread
numeric packages what CompCert is to LLVM or gcc β impressive feats
finding few practical uses vs impressively correct though quite
bug-ridden software in universal use.)
Well, first you would have to implement the unit system
(doubles-with-units), but that would be a one time deal and you can use
it for all kinds of unit checked work.
Then you'd need to implement the basic linear algebra operations with
units (like multiplying a vector by a scalar).
Then you need to implement the basic operations used in the Gaussian
elimination (swap rows, etc.).
Finally you'd implement the Gaussian elimination procedure with
units.
It's hard to guess how much extra code that is. In the first point,
you have 0 code to write for plain doubles but you do have to write code
for doubles-with-units I think in each of the last 3 points you'd end up
with about triple the amount of code compared to plain doubles. That is
mainly because even *specifying* how an operation works on the units is
almost the same work as actually implementing the operation. Take
multiplying a vector by a scalar. You create a new vector where each
element is the corresponding element in the old vector multiplied by the
scalar. Now look at the units: the unit of each element in the new
vector is the corresponding unit in the old vector, multiplied by the
unit of the scalar.
I'm sure future developments will reduce that overhead by a lot, for
instance by better type inference for dependent types and better
automated proof search procedures. The field is just getting started in
that respect. Another thing to consider is that once the Gaussian
elimination procedure is written, clients can use that in a unit checked
way without worrying at all about the internals.
It's interesting to look into. I wonder if someone did it.
What do people generally use dependent types for that you can't do
without dependent types? What are the typical use cases?
It seems to be a popular opinion on this thread that this needs a
dependent type system to work properly. From the descriptions I'm
reading here, I am pretty convinced that a dependent type system would
be VERY helpful. However, I don't think it is required. The C++ type
system suffices, if you are willing to mangle the code badly enough,
such as turning iteration into recursion and similar stuff.
I'm learning some interesting factoids about matrix inversion via
row-reduction that I never knew before. I used to think there was
nothing to it:
Start with matrix M.
Glue on the identity I, yielding M|I
Row reduce, yielding I|M^-1
What surprised me a few days ago was the identity matrix after
row-reduction is not dimensionless. What I learned less than one day ago
is that the identity matrix you glue on to start the process is not
dimensionless either. In both cases, the ones are dimensionless, but not
all the zeros are.
Regarding dependent types β I think it's also about how many versions
of the code are compiled per given set of constants and not just how the
code looks like. Although how it looks like I think determines to a
large extent whether anyone would actually do it in real life.
@Mark: I don't know much about C++, but wouldn't the matrix size need
to be statically determined and the whole algorithm need to be
essentially unrolled at compile time with templates in order for the C++
compiler to check the units at compile time?
@Yossi Kreinin: Dependent types aren't really used for much yet.
There is compcert (which you already knew about), there is F* from
microsoft research, there is Idris, Agda, and others. A lot of the
research is concerned with using dependent types as a new foundation for
math, rather than with practical programming. Idris appears to be the
most popular route to practical dependent types, but it's far from
"production ready".
@Jules: yes.
I'm interested in implementing this in Scala (which is a relatively
mainstream language but does have dependent types), but isn't the
problem inherently type-unsafe? I mean, suppose you're trying to solve
something like:
x * 2 seconds + y seconds = 5 seconds
x meters β y meters = 3 meters
The only way we can add these to eliminate y is by multiplying the
second equation by 1 second/meter, which seems like it defeats the
type-safety point. Or am I missing something?
Why would that defeat the type safety? When you are solving
a*x + b*y = c
d*x + e*y = f
Then you subtract d/a times the first equation from the secnod. The
units all work out that way, and you aren't artificially munging the
units to be the same; it just falls out of the way Gaussian elimination
works anyway.
If you implement this in Scala it'll be damn impressive :)
Would it be impressive in C++?
It would be, but in a different way :) I meant "in Scala with
dependent types, making the code look similar to the vanilla version
that does no unit checking". In C++ you aren't aiming at that.
Oh, that's what you meant. I agree, that would be super cool. That
would upgrade from mere curiosity to something useful!
If I'm not getting too annoying yet... you said earlier in this
thread that I'd have a heart attack if I saw how far you tried to push
this in C++. Do you recall what went wrong, or was it more like "nothing
works"?
Even if I never get this working, it has at least forced me to
properly understand C++ templates, from the annoying things like "when
is typename required", to the fancy stuff like higher-order templates.
Thank you for the blog post.
On a (mostly) unrelated note: I don't believe in Greenspun's tenth
rule, but if anyone saw my code right now, it would certainly act as
evidence in favor. Seriously, compile time map?? If I catch myself
writing lambda, I think I'll just give up.
I'll gladly publish whatever it is you did write so far in a
follow-up... Just to show what's involved...
Myself, I didn't originally try to solve this problem but a related
one β I had fixed point (scaled integer) matrix elements and I wanted
each element to have a different type because their different units
meant they tended to have different ranges. It was a much easier
problem, I think, but the code came out so ugly that I gave up; also I
don't think it was that good numerically.
Solving linear equations with fixed integers has not become my
favorite pastime to say the least.
I agree, that does sound similar in spirit. That sounds sufficiently
similar that I suspect that if I get this working, most of the machinery
should be reusable. Maybe even just replacing "meters ^ X" with
"accurate to "accurate to X digits" or something. My code currently
makes the assumption that you never add two numbers with different
units, but that can be generalized if need be.
But yeah, floating point sounds relatively luxurious compared to
fixed point arithmetic.
come to think of it, that assumption can't be relaxed after all
This paper seems to go a long way towards what you're asking about,
though I haven't thought about it deeply: http://hal.inria.fr/hal-00923380
Certainly it seems very much related, but I don't think it points to
the direction of the problem being solvable. If I understood correctly,
it talks about linear operators applied to matrices, and Gauss
elimination, SVD, eigenvectors computation etc. etc. are not linear
operators. It also mentions about efficient runtime representations of
type information, which implies that the type checking is not fully
static.
Progress report: I finally have the infrastructure in place to begin
writing code that inverts a matrix. My code is over 10KB in size,
consisting largely of typedefs that would not have been needed had I
listened to you and used C++11.
It's not too late :) If you keep it in C++98 and finally publish it,
it'll be hard to make a point β any point β with it since inevitably a
bunch of dorks will say it should have been C++11 and it would have been
entirely different even if it wouldn't be *that* different :)
I don't worry about such dorks. If I can convince you, I'm happy. If
not, then I will rewrite it :)
Incidentally, did the math and type system geek come up with
anything?
Not yet...
I believe I said I'd be done by the end of May. I'm not. However,
I've made significant progress. In particular, I'm now in the second 90%
stage:
http://en.wikipedia.org/wiki/Ninety-ninety_rule.
Sorry been quiet for a while. Almost a week ago, I reached a
milestone, but have since been too busy to report it, or work any
further on this. (International travel, getting my slides ready, caught
some germs, talks to attend...) Anyway, I'm gonna try to work on this
for at least 15min a day while I'm here. The milestone in question was
this: Gaussian elimination works. I'm going to freeze that file, and
then get to work on Gauss-Jordan, so we can then compare how much harder
it is.
The current version weighs in at just over 15KB, so the practicality
of this in the chosen language (version) is not so high. In particular,
if the point of type-checking is to catch bugs, this is
counter-productive, because by having so much more code (almost a factor
of 5), there will almost certainly be more bugs. In fact, not only is
there more code, it is harder code to write, to the extent that I'm
catching myself being tempted to cut corners because the particular
application doesn't need it. In some cases I resist the temptation (yes,
this should be a loop, even though it will only iterate once). In other
cases I give in (row reduction on matrices with more rows than columns
is not supported.) But in any case, you claimed it's impossible. Once we
get possibility established, we can worry about practicality.
I don't think I explicitly claimed "impossible" β you can write a
decimal calculator in sed and you can do similar things in C++'s type
system at build time, it's just gnarly...
I think the closer it is to "practical" the more unusual it will
be... if C++11 makes it that much closer I'd certainly go for
that...
Actually C++14 makes it even better now that you can use "auto" for
return types, instead of only local variables. Even so, I don't know if
a factor of two is achievable. So far, dynamic pivoting is proving
trickier than expected.
I can't wait to publish a discussion of your code with a call to arms
to do this in other languages, dependent types or not...
All right, but I fear that by the time I get my code to work, I will
have become convinced that you were right all along. :)
Sorry for long silence. I think dynamic pivot is not too hard
anymore, but too busy to implement now, overdue stuff going on...
Mission accomplished! It turns out that dynamic pivoting does not add
much to the complexity or code size. Here are the file size break downs
(note that all of these check the matrix size at compile time):
No unit checking, yes dynamic pivots: 3.19KB (There is a link to that
code higher up in this thread).
Yes unit checking, no dynamic pivots: 14.6KB (I have no plans to post
this publicly unless someone asks.) This is about 4.6 times larger than
no unit checking. C++11 would make this a lot smaller, (and C++14 would
be even better) but there is quite a bit of blatant redundancy left over
even after applying the new features in the obvious ways.
Lastly, yes unit checking, yes dynamic pivots: 15.8KB. (I plan to
post a link to this in a few minutes.) This is almost 5x larger than no
unit checking, but only 8% larger than unit checking without dynamic
pivots.
Thus, I conclude that my code supports Yossi's (can I call you that,
or would you prefer something else? I've been carefully wording my posts
for a while to avoid using your name,) conjecture that row reduction is
hard for a type system like that of C++, but not the conjecture that
this is because of dynamic pivots.
Yet another attempt at posting a link:
https://gist.githubusercontent.com/anonymous/a3bed36d2000bd8c4d2e/raw/7c24b790902df0bffeb2afcc9519afa5e09d8879/units.cpp
Did it work this time?
YES!!
Actually if you're willing to #include something like boost::fusion,
or implement some of it yourself, the code can become far more
compact.
Well... tell me when you're think you're done/if you'd like me to
publish a follow-up...
Oops, turns out my code doesn't compile with g++. Will try to fix.
Aside from that, I guess I'm done, though eventually I'd like to upgrade
the whole thing to C++11 or even C++14, because with C++98 this is just
not practical.
Ok, I have no idea why this fails to compile on g++. Can somebody
test on a different version of g++ on the off chance that it's some bug
that has since been fixed?
This is completely off topic, but what happened to the link to your
other blog on your main page? (I mostly ignored it, until I noticed now
it was gone.)
I merged this blog with the other one as a part of the forced upgrade
to WP 3.9 (DreamHost will soon stop supporting the version of PHP needed
to run WP 2.5 or whatever it was that I previously used.)
Oh, ok. On a more relevant note, I realized that you might consider
my solution as cheating. Although I do use dynamic pivots, this is not
traditional Guass-Jordan elimination. In particular, even before I added
dynamic pivots, during the conversion to unit checking, I changed it to
a two-pass algorithm. Now that it does dynamic pivots, it does all the
row swapping at the end.
Sorry I dropped that last link and disappeared. But I ran across your
blog again and was reminded of this, and I did a bit of poking around
again. It turns out that the mathematician George Hart has done the
analysis of how Linear Algebra works out when you assign dimension types
to all values: http://www.georgehart.com/research/multanal.html
Levi: no doubt this is very interesting from a math perspective.
Indeed, I made a couple of cute math discoveries in the process of
implementing this thing. (No doubt George Hart and many others already
knew those things long before me, but still, I was pleased.) However,
math usually ignores implementation details. for instance, many math
textbooks will give you an algorithm for multiplying two n by n matrices
in n^3 time, but won't mention that there are faster ways. Here, the
question is even vaguer, to the point that even most books on algorithms
will not go there. Namely: can you implement this cleanly in some
language so that you can discover at compile time whether you made some
mistakes with units?
The measurements are made in a coordinate system. Much like x,y
coordinates only make sense once you define the basis they are
coordinates in, your measurements only "mean" something once you know
the basis they are measurements in. Somehow they must "carry" their
basis about.
The basis could either be part of their type, or part of their
value.
If the basis is part of the value, then there is a single type and
any operation can choose the basis that the result is in β it could be
the basis in use by any argument, or some special privileged "universal"
basis, or a basis picked at run time to best hold the result (maybe for
accuracy reasons).
If the basis is part of the type, then operations on values in
different basis are still valid, but they must choose the type of the
result. A couple of options are to choose the type (basis) of an
arguments, or to always use a special privileged type (basis).
Thanks for getting me to think about this. I've thought about it in
the past, and it seemed very complex. My suggestion above seems pretty
simple though β I'm probably missing something!
... :-) So, yes, I was at least a few things.
nice one. If you have continued to think about this problem more of
the past 18 months then i hope you'll continue to write about it.
at least if i understand your Post correctly, i'm troubled by the
same issuesβeg, i code in Scala and it seems odd to me that no one is
bothered by the fact that when it comes time to do something
computationally intensive, gradient descent or whatever, we look through
the large Scala collection library full of gorgeous, immutable data
structuresβand we find that none of them work for the sausage-making of
iterative matrix computation, so i drag in Apache Commons Math to do the
real work. A quote from an ArsTechnica article in may 2014: "When you
have, for example, a huge arrayβone that barely fits in memoryβyou canβt
really afford to make any copies of it or represent it as anything
besides a big contiguous chunk of mutable memory. For really hard
computational problems, it is often the case that the only known
tractable algorithms make extensive, essential use of mutable state. In
a sense, Clojure and Haskell are tackling some of the hardest problems
in computer science while Julia is aimed at the hardest problems in
computational science. Someday, we may know how to deal with both at the
same time, but for now, I think we have to choose our battles."
@doug: actually you're describing a different part of the issue β the
extent to which code expressed in terms of immutable values can be
optimized to allocate less memory and mutate it when needed at runtime.
I was mostly talking about the extent to which information known about
your inputs in compile time can be propagated through computations (here
the consensus is that it's relatively easy but it's rooted in intuition
gained when saying that this here's an int and this here's a double and
this here's a list of Point, and I claim that it breaks down when you
want to add more ambitious annotations like units of measurement; that,
in essence, using plain "double" is a form of dynamic typing because the
double could carry very different units, and that this dynamic typing is
essential to getting things done because expressing, in compile time,
what the actual units stored in a double really are is between very hard
and impossible even in relatively simple examples.)
OK, people were still making comments 2 years after the original
post, so I'm going to plunge in and see if anyone is still watching in
2017 :-)
Some of the solutions above are impressively ingenious, but they're
based on a flawed assumption.
In your equation (x1 1)*(A B) = y1, if y1 and x1 are in metres, then
the number 1 is also 1 metre: it is not a dimensionless constant! A and
B are both dimensionless. So in the matrices X'*X and X'*Y, all entries
have type metres^2. No need for mixed types within the same matrix at
all. This is the basis of the elegant solution you're looking for.
Found this article again from an unrelated discussion, happy to see
there is still some activity.
I've actually found a research paper that aims on solving this
problem: "Type inference for array programming with dimensioned vector
spaces" (P. R. Griffioen ). They also have a freely available PDF, not
posting the link to avoid moderation.
Post a comment