Unlock Lisp sorcery in your data structure by implementing Clojure ISeq

People that has gone through The Little Schemer might not find this exciting. One of the things that I discovered while patching Clatrix is that implementing clojure.lang.ISeq interface in your custom data structure unlocks the magic of Lisp composition. By enabling primative operators such as first, next, more, cons, higher-level operations such as map and reduce would just work when operating on your data structure. I find it fascinating that a native Fortran matrix object (through jBLAS) can be made clojury with a few magic operations implemented.

However, getting a deftype implementation of Matrix correct took some effort as these operators are not as simple as they seem.

public interface ISeq extends IPersistentCollection {
    Object first();
    ISeq next();
    ISeq more();
    ISeq cons(Object o);
}

For example, say we have a matrix M like so.

=> (def M (matrix [[1 2 3] [4 5 6] [7 8 9]]))
=> M
A 3x3 matrix
-------------
1.00e+00  2.00e+00  3.00e+00 
4.00e+00  5.00e+00  6.00e+00 
7.00e+00  8.00e+00  9.00e+00

Reducing M across its maps is equivalent to a column-wise operation.

=> (reduce #(map + %1 %2) M)
(12.0 15.0 18.0)

Yet for a while this doesn't work because I wasn't careful on my implementation of first.

Consider the case of a 2x2 matrix. A 2x2 matrix is structurally equivalent to a nested vector. Calling first on these would yield:

=> (first [[1 2] [3 4]])
[1 2]
=> (first (matrix [[1 2] [3 4]]))
A 1x2 matrix
-------------
1.00e+00  2.00e+00

And for a 3x1 vector matrix, i.e. one-dimensional, it is equivalent to a regular vector.

=> (first [1 2 3])
1
=> (first (matrix [1 2 3]))
1.0

But here's a tricky bit. What happens during reduce as it keeps recurring next and first?

Let's step this through for (reduce #(map + %1 %2) M). %1 is basically the result so far and %2 is the first of the remaining collection to be processed.

iterationaccumulated (%1)first (%2)remaining
0nil[1 2 3][[1 2 3] [4 5 6] [7 8 9]]
1[1 2 3][4 5 6][[4 5 6] [7 8 9]]
2, bad[5 7 9]7[[7 8 9]]
2, good[5 7 9][7 8 9][[7 8 9]]

The problem arises in the second iteration. Calling (rest [[4 5 6] [7 8 9]]) returns [[7 8 9]]. However, (matrix [[7 8 9]]) is a row vector and (matrix [7 8 9]) is a column vector. Both are considered one dimensional. In either case, first of a vector should return the first element, which is a number. Thus at this iteration, reduce breaks because you can't map a sequence with a number, (map + [5 7 9] 7), to get an accumulated value.

What we want though, is for the second iteration to return [7 8 9] instead because the original matrix is not a vector. Luckily, this particular problem has been solved by my colleague Antonio Garrote when he did this in Java a year ago by keeping a predicate field signifying is this matrix supposed to be vector or not.

So there you have it. If you find yourself needing to implement deftype to build your own data structure in Clojure. Do consider implementing clojure.lang.ISeq to leverage high-level Clojure functions but be careful about those seemingly simplistic primitive operators.