Vector algorithm using tree composition

Sniffed this trick from the Incanter source. Here's a demo of using tree composition to calculate a 2-d Euclidean distance between two points.

(def x [1 2]) 
(def y [4 5]) 
(defn- tree-comp-each [root branch & leaves]
    (apply root (map branch leaves))) 
(defn euclidean-distance [a b]
    {:pre [(= (count a) (count b))]} 
    (sqrt (apply tree-comp-each + 
                (fn [[x y]]
                    (pow (- x y) 2)) 
                (map vector a b))))

I've only had exposures to tree traversal in the use of implementing searching and sorting algorithms. This trick here definitely widened my eyes to the wonders of functional programming. It's more than just being able to pass and manipulate functions. I need to think like a tree.

A brief comparison of double arrays for high-performance numerical computing in Java

Numerical calculation performance matters if you're calculating gigantic datasets or have limited processing capability. I belong in the latter case as I intend to run my trading system on a free cloud server in the future. There are plenty of research and studies to push the boundary of high-performance numerical computing in Java (see ref. 1 and 2). Most of the latest and greatest is beyond the scope of my quantitative work. What I seek is performance with simplicity and efficiency. That is how I became aware of the Colt Project by CERN, and its multi-threaded branch. Colt is a set of open source Java libraries for scientific and technical computing problems characterized by large data sizes while demanding high performance and low memory footprint. Just what I needed. Out of curiosity, I tested its implementation of double arrays and compared its performance to standard Java libraries.

Method

A random sample of a million decimal values (e.g. prices) are generated and stored in four Java data structures. All data structures have the same one million values in their own respective memory spaces.

  1. Primitive double array
  2. ArrayList of Double objects
  3. Colt
  4. Parallel Colt

Three tests were ran.

  1. A simple diff function. A traversal of the million data points and calculating array[i] - array[i - shift] for all i = shift to 1,000,000. Where shift is a random integer from 0 to 50. This type of calculation is common place in technical analysis indicators.
  2. A sort function sorting the million data points in the array in ascending order. This come in handy for some statistical analysis. In the case of the primitive array, I am using java.util.Arrays.sort(). Otherwise I'm using the sort() method implicit in the other interfaces.
  3. A binary search function on the sorted arrays. For the primitive array, I am using java.util.Arrays.binarySearch().

Each test is ran 3 times with different random seeds and the results are averaged. The system runs Java SE build 1.6.0_22-b04 and Linux kernel 2.6.35-22 on an Intel Core 2 Duo 6300 @ 1.86 GHz.

Result

For the diff, primitive array averaged 9.3 millisecond; ArrayList 27 ms.; Colt 14 ms.; Parallel Colt 15 ms. For sort, primitive 354.7 ms.; ArrayList 1,316 ms.; Colt 270.3 ms.; Parallel Colt 273 ms. For search, primitive 559.7 ms.; ArrayList 1,561.7 ms.; Colt 575.7 ms.; Parallel Colt 551.3 ms.

Discussion

As this isn't a vigorous test, I wouldn't mind too much on the actual numbers. The take home message here is that the performance of Colt is at least in the same magnitude of a primitive array. Whereas a standard ArrayList object is a magnitude worse. However, where Colt really shines is in its methods for handling matrices. It offers plenty of convenient functions for matrix calculations while guaranteeing good performance. To work with matrices in Java, you either use third-party libraries such as Colt or manage a bunch of for loops. Which isn't pretty.

References

[1] Moreira, et al., "Java programming for high-performance numerical computing," IBM Systems Journal, Vol. 39, No. 1, pp. 21-56, 2000. [2] Wendykier and Nagy, "Parallel Colt: A High-Performance Java Library for Scientific Computing and Image Processing," ACM Transactions on Mathematical Software, Vol. 37, No. 3, September 2010.

How to backup your data on the cloud with ease and for free

Dr. Steenbarger's recent harddrive failure reminded me about the importance of backing up my data. Everything that I do with trading is on the computer. Consequently, I have many files that are vital to my trading. From web bookmarks to chart templates to source codes, I would have to make a checklist to ensure that I backed up everything frequently. But only if I were to do my backup manually. Being the lazy and cheap person that I am, I don't like the idea of forcing myself to backup frequently. That is why my folders are synchrnized automatically to an online file server using Mozy Remote Backup. I don't have to do a thing to know that all my important stuff are safe. There are two reasons why I choose Mozy. It is one of the two biggest names in the business (the other is Carbonite). It is the only one of the two to offer a free for lifetime 2GB online backup space (affiliate). The benefit of using Mozy is that it actively synchronize my files to their cloud storage server. These are files that I change often. It makes sense to use Mozy to monitor and back them up automatically so that I don't have to worry about them. Still, I have another 50GB+ of personal photos, videos, and documents that won't fit in this free 2GB space. Which is why I only use Mozy for my frequently changing or critical files (mostly trading related). For the rest of my 50GB+ data, they are typically static content (e.g. photos and media files). So there is not much point for me to pay a monthly fee. I rather spend a few minutes a month to make backup DVDs or upload them to my Amazon S3 account. That way, I have all my data backed up and get to keep my \$5 a month (price of monthly subscription fee for an unlimited storage space backup) too. I am not being paranoid. Another one of my computers died on me a few years back with no prior warning. Its power supply suddenly blew up in smoke and sparks one day. The scene was quite spectacular. More than a few components were damaged then. I doubt that I will see that kind of fireworks again. But who knows what else could go wrong with this aging computer (4 years and counting)? A data failure with my computer would indeed be very costly for me. I invested a lot of time and energy on my trading work. If my harddrive were to fail, I may lose all of that work if I don't have a backup. That is why I have a free automated data backup plan which doesn't require me to lift a finger.

←   newer