August 2004

You are currently browsing the monthly archive for August 2004.

To finish this trilogy of benchmarks, I’ll give you the results for two faster algorithms (shell sort and merge sort) and tell you why C is not slower after all.

But first I have to add a small, but nevertheless important, reason for ArrayList’s slowness to my discussion in the last post: ArrayList use a System.Object[] array for internal storage of its contents. This simply means that a lot of extra boxing/unboxing is going on and thus ArrayLists impose lots of overhead. I’ll come back to this a later.

I did not even think about this, but when invoked without any flags, gcc uses -O0 by default, which translates to “just do the fasted compilation possible, don’t think about code performace and do not alter the code in any way.” This is very practical for debugging, but of course not for production use. A friend of mine pointed this out, when we were discussing the results of my tests. Obviously mcs/mono do optimize the execution. With -O3 I got totally different results.

Here is a table of all average execution times I got on my machine. Please note that they are just approximations.

  C (gcc) C (gcc -O3) Mono C# (Array) Mono C# (ArrayList)
Initialization 0.018/0.100 0.018 0.075/0.23 0.12/1.13
Array[List].Sort() - - 1.5/32 0.89/20.5
BubbleSort 43 22 35 too long
InsertionSort 32 7.7 19 too long
InsertionSort (optimized) 25 13.5 18.5 7m 24s
BinaryInsertionSort 23 7.0 17 too long
BinaryInsertionSort (optimized) 6.2 6.5 7.1 7.8
BinaryInsertionSort (no recursion) 6.5 6.4 6.0 7.8
ShellSort (value set 1) 7.5 4.2 5.8 2m 20s
ShellSort (value set 2) 0.09/4.6 0.056/3.3 0.15/5.5 1.6/61
MergeSort 0.054/1.091 0.045/0.917 0.132/1.66 0.78/21
MergeSort (no recursion) 3.3 3.3 - -

The “optimized” algorithms are the ones with memmove, Array.Copy(), etc. Where two values are given, the first one is for 100.000 items and the second one for 2.000.000 items. A discussion of gcc’s perfomance with optimizations turned on (especially in the case of the optimized insertion sort) is out of scope for this small series.

So there are only two questions left: Why is Array.Sort() so slow, and why is ArrayList.Sort() faster. The answer to the former one: Array.Sort() uses the IComparable interface. This means boxing and function calls, i.e. loss in efficency. To get a feeling for the impact of (un)boxing, I tested morge sort (version 2) with an object[]-array of 100.000 (boxed) integers. It was considerably (about 3 times) slower than the version with an (not boxed) integer array.

The answer to the latter question is probably, that the values are already boxed in an ArrayList, so there is only the overhead of function calls.

To sum this up: With .NET one should really look out for (unintentional) boxing of value types. This might get (at least a bit) easier with the inclusion of generics in .NET 2.0. Until then, one has to watch out for this pitfall, especially when writing performance ciritical code.

After I realized (as described in the first part of this series), that Mono was faster than I had imagined, I continued my benchmarks with the two versions of insertion sort and wondered if .NET could somehow counter the efficency of memmove.

Insertion Sort

Again, explanations and C source code can be found in fejj’s original post. A description of the testing procedure is in my first article. The unoptimzied C code finished after about 32 seconds. Converting the code to C# was again very straightforward and I won’t bore you with listings.

Mono arrays finished even better than last time. Only 19 seconds! The ArrayList again was so slow, that I aborted the test.

So, how could I optimize the C# code to “counter” C’s memmove? “Luckily”, the optimized C code takes 25s to sort the 100.000-item-array, which is still slower than the unoptimzed C# code! At first I thought that there was no way to do something like memmove in C# as it is a very low-level operation and the .NET Framework is all about abstraction. I didn’t even bother looking through the docs. But I did take I closer look at the memberlist of the ArrayList class. Two interesting functions caught my attention: GetRange() and SetRange(). The former returns an ArrayList, but “does not create copies of the elements: the new System.Collections.ArrayList instance is a view window into the source list.” (from MonoDoc). The latter takes and ICollection and a startindex and copies the values from the collection to itself. If the implementation was done well and recognized that the source and the target were the same, then it could speed up things quite a bit. And indeed, it did so. Insertion sort on ArrayLists was now down to 7 minutes and 24 seconds…

After this success, I thought that I might take a closer look at System.Array and found the Copy(Array sourceArray, int sourceIndex, Array destinationArray, int destinationIndex, int length) method. From the docs: “[The function] Copies the specified number of elements from a source array starting at the specified source index to a destination array starting at the specified destination index. [...] If sourceArray and destinationArray are the same array, [Copy()] copies the source elements safely to their destination as if the copy were done through an intermediate array.“. Looks promising, but in this case it only sped up execution by about 0.5 seconds. Not much, but this technique might prove to be of good use later.

Binary Insertion Sort

fejj’s code (without memmove) takes 23s, C# Arrays are again faster, but only by 6 seconds, and ArrayLists are again far too slow. Plugging in the optimizations I got a bigger improvement than last time: 6.2s (C), 7.1s (C#), 7.8s (ArrayList). The world is normal again! C code is faster than C# code! This is also the first algorithm, here ArrayLists can compete with ordinary arrays. Supposedly get/set operations are very slow with ArrayLists as the optimized binary insertion sort has very few of them compered to all former algorithms. This is probably due to the fact that the System.Array indexer can be directly mapped to a few assembler instructions (like C arrays), while when using an ArrayList there is an additional method call, which produces quite a lot of machine instructions. And of course there is the second checking of list bounds and the internal version bumping.

Eliminating the recursion only speeds up the C# array version (don’t ask me why), so that it runs in about 6.0s.

Today I wanted to install PHP 5 + Apache 2 on my brother’s laptop because he wanted to learn some (PHP) programming. While I was fighting with the PHP install process (the Windows PHP installer does not come with an Apache module?!) the Apache Start Menu entries where suddenly filled with myriads of executables like “Britney Spears Sexy archive.doc.exe” (to name one of the more harmless sounding – I don’t want to risk becoming linked under the false keywords in google…). A quick search showed that all folders, which have something to do with up-/downloads have been spammed with these files. Simply deleting them did not help – after rebooting they appeared again.

At least I now knew that some program was executed at windows startup which created all these files – probably some kind of worm. The strange thing was, that this particular box was behind a masquerading server for quite some time, so nobody should have been able to directly connect to this computer from the net to exploit some windows bug. And my brother did only download the PHP and Apache installers and some MP3s lately and did not open any attachments (and he is using Mozilla Thunderbird, so no Outlook exploits either).

So I started googling for all currently running processes. For some harmless Windows built-in exes, google linked me to subpages of http://www.neuber.com/taskmanager/process/. Follwing the links on the page, I found Security Task Manager. This app lists all currently running processes – just like Windows Task Manager – and additionaly displays information like possible threats, full path of the executables, program description (when available) and a list of all character strings in the file. Very handy indeed. It also has a quarantine function, which moved the executable to a isolated folder and removes all (hidden) autostart entries for the file (and does a backup, so everything can be restored, if the process later proves to be harmless).

A file called FVProtect.exe was located as the most possible dangerous running program and quarantining proved that it really was the delinquent. A quick internet search showed that it was the executable of a known worm called W32.Netsky.P@mm and Symantac provided a handy removal tool, so all is now well again…

I just don’t know, how this worm infacted the computer in the first time. Maybe through the ads ICQ displays at startup?

This completly took away the time I had set aside for writing the next part of my Sorting in Mono series, so no post on insertion sort today…

Jeffrey Stedfast’s (fejj) series on sorting algorithms has spurred me to compare his results with equivalent implemetations in C#/Mono. I expected Mono’s performance to be no match to the direct implementation in C. After all with .NET one has the additional layer of the VM and no direct access to memory. But the actual results I got were quite astonishing.

What did I do?

I did two comparisions: One with a standard array and one with an ArrayList.

System.Array

I simply took fejj’s source code and changed "int a[100000]" to "int[] a = new int[100000]" – and by the way removed the unnessesary size_t n Parameter of the sorting functions. .NET’s arrays are more intelligent than their C/C++ counterparts, which are mostly just a convenient shorthand for direct pointer arithmetic. System.Array objects are conscious of their own size and check if the programmer stays within them.

Of course the initialization of the array is different to C, so I got following skeleton for the tests:

using System;

public class Sort
{

static void Main()
{
int[] array = new int[100000];

Random random = new Random();
for (int i = 0; i < 100000; i++) {
array[i] = random.Next();
}

DoTheSorting(array);
}
}
System.Collections.ArrayList

Additionally I wanted to know how well the ArrayList does scale.

using System;
using System.Collections;

public class Sort
{

static void Main()
{
ArrayList list = new ArrayList(100000);

Random random = new Random();
for (int i = 0; i < 100000; i++) {
list.Add(random.Next());
}

DoTheSorting(list)
}
}
The system

Everything was tested on my AMD Athlon XP 2200+ system running Gentoo Linux, gcc 3.3.3 and Mono 1.0. It was also done while the whole desktop was running and timed using the time command. The numbers are just approximations but should be good enough to give an impression of the picture. I also do not know how good the two pseudo-random number generators are and whether they influenced the performance.

Doing nothing

The first thing I wanted to know was how long it takes just for starting the programm and initializing the array. I’ll just give you the figures for 100.000 and 2.000.000 entries (I the latter for an additional comparision of the faster algorithms):

  • C: 0.018s / 0.100s
  • Mono/Array: 0.075s / 0.23s
  • Mono/ArrayList: 0.12s / 1.13s

Nothing unexpected here: ArrayList are the slowest (and scaled the worsed) and Mono has some additional startup overhead compared to plain C.

Bubble sort

For C code and explanations see fejj’s original post on bubble sort. I only used the second (improved) version. The C implementation finished after 43 seconds. The C# versions needed minimal changes:

public static void BubbleSort(int[] a)
{
int tmp, max, j;

for (int i = a.Length - 1; i >= 0; i--) {
for (max = 0, j = 1; j < i + 1; j++) {
if (a[j] > a[max])
max = j;
}

if (max < i) {
tmp = a[max];
a[max] = a[i];
a[i] = tmp;
}
}
}

and

public static void BubbleSort(ArrayList a)
{
int max, j;
object tmp;

for (int i = a.Count - 1; i >= 0; i--) {
for (max = 0, j = 1; j < i + 1; j++) {
if ((int)a[j] > (int)a[max])
max = j;
}

if (max < i) {
tmp = a[max];
a[max] = a[i];
a[i] = tmp;
}
}
}

Running the compiled code was an ambivalent experience. The array version finished sorting after 35 seconds. That is 8 seconds off the C version! I just could not believe my eyes. But even rerunning the programm did not change the outcome. How can it be that the C# version with all these additional layers of complexity and abstraction (the VM, the more intelligent arrays…) is so much faster than simple, straightforward C code. Is the JIT just better at optimizing the code than gcc is? I just do not believe it and have not explanation of these results.

The ArrayList on the other hand performed so bad that I got impatient an killed the test after about 10 minutes. For 10.000 items the sorting takes about 2 seconds. As I figure access to ArrayList items is pretty slow and gets slower with increased list size – and bubble sort has a lot of item accesses (~100mio for the 10 thousend items sample I ran the profiler on).

Tomorrow I will show you comparisions for insertion sort and how one can speed up moving large portions of arrays in .Net.