Eureka moment on design patterns for functional programming

Understanding design patterns for object-oriented programming made my life easier as a Java programmer. So I have been looking for a comparable book for functional programming ever since my sojourn into this age-old paradigm. It looks as though I'm not the only one looking too. But the thing is, I think I've just had a revelation of sort.

There is one and only one guideline to designing functional architectures -- Keep it simple. Simple as in keeping your functions having a single purpose only. Simple as in work with the data directly and don't conjure unnecessary intermediaries. Simple, as elaborated by Rich Hickey in his talk, Simple Made Easy. Much of this is conveyed in Bloch's Effective Java:

item \#5 \– Avoid creating unnecessary objects
item \#13 \– minimize accessibility of classes and members

, for examples.

As Majewski said in a stackoverflow reply (Update 2013 – question no longer available on stackoverflow),

The patterns movement started because object-oriented programming
was often turning into spaghetti objects ... Functional programming
is a restricted style of programming, so it didn't need to grow a set
of restricted conventions to limit the chaos.

As such, there is no design pattern book for functional programming. I didn't get that earlier this year. But something clicked recently. During the past few months, I've been doing some consulting and open source projects solving algorithmic problems with Clojure.

One of the problems in a project that I was faced with this week is calculating the occurrence of each distinctive element within a list of elements. Say we have a list, coll = ("orange", "bottle", "coke", "bottle"). The output would be something like [("orange", "bottle", "coke") (1 2 1)]

This is my first solution.

(defn eval-decompose
  [coll]
  (let [super-d  (distinct coll)
        freqs    (loop [ps  []
                        d   super-d]
                   (if (seq d)
                     (let [c  (count (filter (partial = (first d)) coll))]
                       (recur (conj ps c) (rest d)))
                     ps))]
    (map #(vector %1 %2) super-d freqs)))

The specs are not exactly as I described but the concept remains. What I did is to use tail calls (it's supposed to be fast, isn't it?) to aggregate each counter to produce a vector of counts. Then I map each pair of fragment with its corresponding count to generate a final output collection. Sounds overly complicated, doesn't it?

This is the first warning of a bad functional design. For a collection of 30,000 items, this function took 11 minutes to compute on my notebook. This looks like a good place to exploit the parallel nature of this problem.

Specifically, the counting of each fragment is independent of other fragments. Thus, there's no need for the program to wait for one fragment to finish to process the next. I simplified the program to remove this inherent assumption of procedural processing. Here is the gist of the refactored code where each function only does one job. Since the processing are modularised, I can parallelize the algorithm easily with the use of pmap instead of map on the last line as shown below.

(defn match-count
" Given key, k, returns number of occurrences of k in collection, coll.
"
  [k coll]
  (let [match?  (fn [i x]
                  (if (= k x)   ;; closure on k
                    (inc i)
                    i))]
    (reduce match? 0 coll)))

(defn calc-counts
" Returns a list of counts for the occurrences of each key of keys, ks,
  within the collection, coll.
"
  [ks coll]
  (pmap #(match-count % coll) ks))

I've split the first function into 3 functions (2 shown here). As Hickey said in his talk, simplifying can often produce more, not less, functions. Yet, the program is not only easier to read and runs in less than a minute. An order of magnitude faster! There are still lots for me to learn. I want to find more challenging projects to push my own limits. But rather than solving arbitrary problems, I prefer to tackle real-world challenges. So if you know of anyone that can benefit from collaborating with a functional developer to build robust and scalable software, please pass along my contact.

Follow up: Kevin Lynagh showed me three better ways of doing this in a follow-up post – Algorithmimc ownage. Humbled.