Numerical calculation performance matters if you're calculating gigantic datasets or have limited processing capability. I belong in the latter case as I intend to run my trading system on a free cloud server in the future. There are plenty of research and studies to push the boundary of high-performance numerical computing in Java (see ref. 1 and 2). Most of the latest and greatest is beyond the scope of my quantitative work. What I seek is performance with simplicity and efficiency. That is how I became aware of the Colt Project by CERN, and its multi-threaded branch. Colt is a set of open source Java libraries for scientific and technical computing problems characterized by large data sizes while demanding high performance and low memory footprint. Just what I needed. Out of curiosity, I tested its implementation of double arrays and compared its performance to standard Java libraries.
A random sample of a million decimal values (e.g. prices) are generated and stored in four Java data structures. All data structures have the same one million values in their own respective memory spaces.
- Primitive double array
- ArrayList of Double objects
- Parallel Colt
Three tests were ran.
- A simple diff function. A traversal of the million data points and calculating array[i] - array[i - shift] for all i = shift to 1,000,000. Where shift is a random integer from 0 to 50. This type of calculation is common place in technical analysis indicators.
- A sort function sorting the million data points in the array in ascending order. This come in handy for some statistical analysis. In the case of the primitive array, I am using java.util.Arrays.sort(). Otherwise I'm using the sort() method implicit in the other interfaces.
- A binary search function on the sorted arrays. For the primitive array, I am using java.util.Arrays.binarySearch().
Each test is ran 3 times with different random seeds and the results are averaged. The system runs Java SE build 1.6.0_22-b04 and Linux kernel 2.6.35-22 on an Intel Core 2 Duo 6300 @ 1.86 GHz.
For the diff, primitive array averaged 9.3 millisecond; ArrayList 27 ms.; Colt 14 ms.; Parallel Colt 15 ms. For sort, primitive 354.7 ms.; ArrayList 1,316 ms.; Colt 270.3 ms.; Parallel Colt 273 ms. For search, primitive 559.7 ms.; ArrayList 1,561.7 ms.; Colt 575.7 ms.; Parallel Colt 551.3 ms.
As this isn't a vigorous test, I wouldn't mind too much on the actual numbers. The take home message here is that the performance of Colt is at least in the same magnitude of a primitive array. Whereas a standard ArrayList object is a magnitude worse. However, where Colt really shines is in its methods for handling matrices. It offers plenty of convenient functions for matrix calculations while guaranteeing good performance. To work with matrices in Java, you either use third-party libraries such as Colt or manage a bunch of for loops. Which isn't pretty.
 Moreira, et al., "Java programming for high-performance numerical computing," IBM Systems Journal, Vol. 39, No. 1, pp. 21-56, 2000.  Wendykier and Nagy, "Parallel Colt: A High-Performance Java Library for Scientific Computing and Image Processing," ACM Transactions on Mathematical Software, Vol. 37, No. 3, September 2010.