Benchmarking PHP code is a problem. Most benchmarking methods are based on real time measurements. That means: You get the current time using microtime(), run your code, then get another timestamp from microtime() and output the difference between the two timestamps. This code exists in a gazillion of variants.
Problem: The time you get this way is real time, not user time or system time. If the server is busy doing many things at once - and that's what multitasking systems are made for - then timing results of the same code will vary with every call. To get results that you can use for comparisions you will have to run your benchmark multiple times on an otherwise idle system and middle the results.
While searching for better approaches I stumbled over the PHP function getrusage(). It is unusable on Windows and the returned array may differ on the various other platforms supported by PHP, so you will have to check the values to see what you can expect:
<?php $dat = getrusage(); print_r($dat); ?>
On x86 running Linux the interesting array slices are:
ru_utime.tv_sec: User time, full secondsru_utime.tv_usec: User time, microsecondsru_stime.tv_sec: System time, full secondsru_stime.tv_usec: System time, microsecondsLogically thought, the code to time a piece of code should look like this:
<?php $dat = getrusage(); $utime_before = $dat["ru_utime.tv_sec"].$dat["ru_utime.tv_usec"]; $stime_before = $dat["ru_stime.tv_sec"].$dat["ru_stime.tv_usec"]; /* * This is the place where your code goes */ $dat = getrusage(); $utime_after = $dat["ru_utime.tv_sec"].$dat["ru_utime.tv_usec"]; $stime_after = $dat["ru_stime.tv_sec"].$dat["ru_stime.tv_usec"]; $utime_elapsed = ($utime_after - $utime_before); $stime_elapsed = ($stime_after - $stime_before); echo "Elapsed user time: $utime_elapsed µseconds"; echo "Elapsed system time: $stime_elapsed µseconds"; ?>
This seems to give better results than just measuring elapsed real time, though the values are still varying from time to time. Once I got a negative value for both times - I have absolutely no clue how that can happen.
It could be possible that using the CGI version of PHP instead of an Apache module gives more constant results. The fact that the CGI version is generally slower than the module doesn't matter if you want to compare two versions of a function or if you want to find the slowest part of a program.
There must be more. On the drupal-devel mailing list, Zbynek Winkler pointed me to a PEAR package called APD. The only online description: APD is a full-featured profiler/debugger that is loaded as a zend_extension. It aims to be an analog of C's gprof or Perl's Devel::DProf. Sounds like a toy for a boring weekend - which I never happen to have. Nevertheless I should get the time to try the package out. It seems to contain some interesting details.
Puh, and all I wanted was a lousy benchmark. Something like time in unixoid shells. And yes, I know that I have a healthy trust in autocast. 
Comments
The reason you get negatives
The reason you get negatives is because the usec is not zero padded. You need to use sprintf %d.%06d, or similar.