453 -

I found out some obvious-in-retrospect things this week when doing a little optimisation work. This is work in C#, but I’m sure it’s applicable to many languages. If you want to skip the details, but want the take-home lesson, here it is: if you want reasonable performance with large-ish data-sets (I was working of the order of 5 million elements), use primitive structures — i.e., arrays and hand-coding — rather than the convenience classes provided by your framework of choice. Now, on to the examples.

Firstly, if you are building up a large array — especially if you know a bound on the size — don’t use an ArrayList, use an array. I slashed a generation stage of my app in half, from 26 to 13 seconds, via this alone (after doing some study of the problem to figure out the maximum possible array size).

Tuning the behavior of array resizing is also useful. The behavior of most generic ArrayList implementations is to double the array size as they go. If you know, for example, that the solution space expands exponentially rather than linearly, then you can write a far more efficient implementation of array re-allocation.

Secondly, memory matters and more complicated structures are likely to be more wasteful than something you code yourself. By moving to my own C# array+code I saved about a gigabyte of memory. Granted, this was partly because of some slightly sloppy choices I made early on — when I thought we would be dealing with a few thousand items, so forgivable choices I would counter — I was duplicating memory all over the place. As I see it you have two choices when dealing with a case of resizing arrays:

More efficient speed: resize arrays by large amounts at one time, e.g., doubling when running out of space. Then when someone requests the contents of the array to iterate over, you will have an array that is larger than the number of items in it, so necessitating a new array of the correct size to be allocated, the new items copied in and returned. This is a good choice when the addition must be fast.
More efficient for space: re-allocate the array on addition by the number of items being added. When a copy is required, return the reference to the original array. This makes adds slow, gets fast and minimum memory usage. Of course, it will only work well in a non-threaded environment unless everyone is very careful.

There are obviously optimisations that can be made, but it’s a classic speed-vs-space trade-off. It’s a little more complicated because it’s more “fast-add, slow-get, more memory” vs “slow-add, fast-get, low memory” case, so it depends how you are working. In my case, the latter worked excellently because I have a pattern of few-additions vs many-gets. Again, my understanding of the problem at hand allowed me to easily surpass generic implementations.

Finally, as an aside more than anything, I realised there’s a super fast stack implementation if you know the maximum-bound of the stack; which you quite often will if you take some time to analyse the problem. It can be illustrated as follows:

[gratuitus line-break are thanks to my dodgy weblog software]

Yeah, it’s simple and obvious, but I hadn’t really thought about it before. Trick is that i always points to the position beyond the top of the stack. The ordering of ++ and — is important, and presumes a C-style implementation of precedence.

So silly little things, but ones which will really make a difference. I think this is one reason why C programs are fast: you have to know what is going on and, as such, tend to use the most efficient methods for the problem at hand. My take-home lesson is that, while convenience classes certainly have a place, they can be no substitute for some code designed specifically for the problem in mind when efficiency is needed.