Algorithmic ownage

I felt good when I simplified one of my algorithms and sped it up 10 times. I felt so good that I even wrote an entire blog post about it patting myself on the back. Then last week I got an email from Kevin of Keming Labs suggesting a few alternatives.

;;Three ways to count the number of occurrences in a collection
;; ("orange" "bottle" "coke" "bottle") => [("bottle" "coke" "orange") (2 1 1)]
(let [c '("orange" "bottle" "coke" "bottle")]

  (let [counts (reduce #(conj %1 (hash-map %2 (inc (get %1 %2 0))))
                       {} c)]
    [(keys counts)
     (vals counts)])

  (let [l (for [[k v] (group-by identity c)]
            [k (count v)])]
    [(map first l)
     (map second l)])

  (let [counts (apply merge-with +
                      (map #(apply hash-map %)
                           (partition 2 (interleave c (repeat 1)))))]
    [(keys counts)
     (vals counts)]))

First of all, his solutions looked much cleaner than mine. Then over the weekend I was able to incorporate his 3 algorithms into my program. I ran a few benchmarks and here are the average of 2 tests using a dataset of 28,760 items.

  • My algorithm. Elapsed time: 68372.026532 msecs.
  • Kevin's solution #1. Elapsed time: 156.940976 msecs.
  • Kevin's solution #2. Elapsed time: 60.165483 msecs.
  • Kevin's solution #3. Elapsed time: 296.162042 msecs.

Total ownage. That's what I like about sharing my work; once in a blue moon, a reader drops by and generously show me how I can improve a solution 1,000 times! Now the ball is in my hands to understand what he has done and improve myself. Collaborating and learning, that's why I open source.

Update: I've done some more digging and it seems that one of the reasons for the drastic improvement in performance is due to the use of transients in the built-in functions. Lesson of the day, leverage the language's inherent optimization by staying with core data structures and functions as much as possible.