Thursday, April 23, 2009

Efficient Code - PHP

I guess I'm a nut about execution speed. Well, not if it gets in the way of clarity, but I hate writing slow code.

But . . . always a 'But' . . . it's hard to know what is slow and what isn't. Seems like it should be easy, but I never know until I measure.

Here's an example:

The PHP manual page for preg_split() says that you shouldn't use it unless you need the flexibility of regular expressions. You should use explode() or str_split(), because they are simpler and therefore faster. I believed that.

Then I thought I'd like to see how much faster, so I wrote a couple of loops which split a comma separated string with both preg_split() and explode() to see how much I was really losing.

Here's are the average times of 10 passes through 1,000,000 splits using each:
preg_split() - 3.51239209175 seconds
explode() - 4.33661820889 seconds

preg_split() is about 23% Faster than explode().

I'm still not sure I believe it. Here's the code - so you can try it yourself.

And, please let me know if you see anything I did wrong.


$str = 'this, is , a , string, with , commas, in ,it';
$count = 1000000;
$passes = 10;
$times = array(xdebug_time_index());
$avg = array('preg' => array(), 'explode' => array(), 'map-explode' => array());
function dt($offset = 1)
{
global $times;
$last = count($times) - 1;
return $times[$last] - $times[$last - $offset];
} // end of dt()

for ($pass=0;$pass<$passes;$pass++) {
echo "Pass $pass\n";
$times[] = xdebug_time_index();
for ($i=0;$i<$count;$i++) {
$v = preg_split('/^\s*,\s*/', $str);
unset($v);
}
$times[] = xdebug_time_index();
$avg['preg'][] = dt();
echo "$count preg_split's(regx, str): " . dt() . "\n";

$times[] = xdebug_time_index();
for ($i=0;$i<$count;$i++) {
$v = array_map('trim', explode(',', $str));
unset($v);
}
$times[] = xdebug_time_index();
$avg['map-explode'][] = dt();
echo "$count array_map(trim, explode(',', str)): " . dt() ."\n";

$times[] = xdebug_time_index();
for ($i=0;$i<$count;$i++) {
$v = explode(',', $str);
unset($v);
}
$times[] = xdebug_time_index();
$avg['explode'][] = dt();
echo "$count explode(',', str): " . dt() ."\n";
}

echo "Averges\n";
foreach ($avg as $key => $ar) {
echo "$key average of $passes trials of $count splits: " . array_reduce($ar, create_function('$a,$b', 'return $a+$b;'), 0) / count($ar) . "\n";
}

Tuesday, April 21, 2009

Documentation - It's Part of the Design Process

I was documenting some code I just wrote to create a packing list for an internet store we'll be opening soon. It's kind of a hairy problem to come up with a semi-optimal allocation of goods into boxes - but absolutely necessary to do right if you want to accurately estimate shipping and keep costs down.

It took me about a week to really get my head around the problem so that I could solve it. The actual solution now looks pretty obvious - but it won't in six months or a year.

So I document all my code - profusely, but compactly.

A Strange Thing happened while I was writing the doc: I found and corrected a whole bunch of bugs - both coding bugs, 'bugs of omission', and design bugs. That's when I realized that I use the very act of documenting code as part of my design iterations.

So, I started thinking about what I do and why I do it the way I do - which is (as usual) a bit contrary to 'accepted best-practice'.

First of all, there are basically Three Types of documentation in code:
  • The Code Itself
  • Comments
  • Doc
The Code: My code is semi-self documenting. I do the usual stuff: identifiers are semantically related to their use [function save_data() - etc]; use of white space; consistent breaking of long lines and breaks at binary operators; consistent indentation.

But you can do more.

Simple, direct, and consistent code does wonders for readability. Consistency in within-loop logic [use if elseif elseif ... or switch case case default or if () { stuff; continue;} if () { stuff; continue;}]. I don't think it matters what style you use as much as consistency and simplicity.

Small is usually good as well. We used to talk about 'locality of code' meaning that our code was written in chunks which weren't 'too long.' 'Too long' usually really meant 'it fits in the window of my editor' so that you can stare at it and understand the whole thing. There's also 'too short', but the Ruby on Rails guys have the market on that with their obsessive DRYness.

Anyway, the Code itself is always the final authority on what the program does - so it's the most authoritative documentation.

Do yourself a favor and make it as readable as possible - you might have to fix it later.

Comments: Helpful hints which make the Code more readable. Some (IMHO) IDIOT came up with the idea that Comments Must be Set Off with Big Boxes of Stars or something like that.

Why?

Well, I'll tell you.

We used to write programs on Paper and then punch them into Punch Cards and run them through a Card Reader and then get a Big Wide Printout on Greenbar paper and do our debugging at a desk. We'd page through this Paper with a pencil and write revisions to our programs. The paper was 14 inches wide [132 columns] and 11 inches long [60 lines at 6 lines per inch with 1/2 inch top and bottom margins].

In those days it made sense to Mark Off the comment blocks. It made them easy to spot - both for someone reading the code and for the Boss as he walked by your desk. [You'd get demerits for not commenting your code, so it was a good idea to make it easy for him to see].

Things have changed. You can easily flip around in an on-screen editor, but you can't very well write on it. It works a lot better if the information in the editor window is very dense. The way to do that is to write meaningful comments with as little excess white space as possible.

Now I use blank lines to separate logically disjoint chunks of code and preceed them with comments - if appropriate - but I don't waste a lot of screen space blocking off comments.

I think we need a very high Signal to Noise ratio - and big blocks of comment markers are just noise.

I don't use PHPDoc because those guys seem to think that the Documentation and the Comments are More Important than the Code itself. I don't. I want as much Code on the screen as I can get if it is blocked out so that the logic is obvious and commented as needed.

The Doc: The Doc is narative description of what the code does, how it does it, how to use it, calling parameters, etc etc. Can't live without it.

I've experimented with all kinds of methods. I used to write UNIX style man pages - and they are still a really good format. But now days, everything has to be in HTML, so I've changed.

There are a lot of Documentation Project things - like PHPDoc - but I don't use them.

I think they're too complex and they are usually specific to one language. Besides, they just aren't necessary for good, clear documentation.

Like many people, I work in several languages [one reason I'm bald] so I need a multi-lingual documentation system. I also want something which doesn't take up a lot of space in my program files and which is really simple. Here are my 'Must Haves':
  • Must allow the Doc to be imbeded in the same file with the code
  • Must produce HTML
  • Must be flexible
  • Must be very Simple
I couldn't find anything which didn't take multiple days [or weeks] to learn, so I wrote up something to pull Textile makeup out of a simple text file and that's what I use.

Frankly, I'm smart enough and write well enough to describe what some code does. So are you. Just pretend you're writing for yourself about a year from now when you have to find a bug and fix it in record time - or you'll be fired.

What kind of Doc do you want then? Clear, correct, self-contained without a lot of references jumping around, etc etc. Write something which will save your butt when the crunch comes and you're fine.

Design?

So what does this have to do with Design?

As I write my Doc - part 3 of the Code Documentation - I find myself thinking about how the whole thing works and how understandable it is. It can get kind of contemplative.

Another thing which happens is that I find myself wondering why in the hell I wrote something and what it really does - so I go through the logic again.

There's a different perspective and attitude when reading code to understand and clearly document it than there is when writing it initially. At least there should be.

When I write code, I get involved in the microscopic detail, in solving the problem, in creating the logic and the loops.

When I read code to document it, I get involved in the overall strategy of the solution and the clarity of the logic. It's just different.


Anyway, that's how I think about Documenting code now days.

Oh, if you're interested in my document extractor - I'm planning on dumping on the download page of http://www.clove.com. It's written in Python, so it should run about anyplace.