Friday, October 16, 2009

Missing Method in Python

Preamble - optional

Ruby has a whole boat load of methods which allow a programmer to overload and override all kinds of stuff. One of them is 'method_missing', which is a method which is called if you invoke 'foo.bar()' and 'bar' is not the name of a method in 'foo'.

The Rails folks use it to transform function names like: find_foo_by_date_and_color(date, color) into a parameterized sql call, thus trading speed for apparent code clarity, added confusion for humans trying to understand the code, and adding inches to their stature as cool programmers.

You probably get from this that I think the Rails people overuse things like method_missing.

You're right.

But that doesn't mean it isn't useful.

So to add inches to my cool programmer stature, here's a case I think is justified - and how to do it in Python, which doesn't directly support method_missing (at least they don't seem to admit it - as far as I was able to find)

The Problem

I'm writing a little local HTTP server for my YASiteKit CMS/web-site kit thing (http://yasitekit.org) and decided to make it HTTPS capable. This turned out to be just a few lines of code in Python 2.6, but it broke the server. It turns out that while SSLSocket objects support some 'file-like' methods, they don't support all of them - such as readline() - which are used in the bowels of the HTTP server library.

The 'obvious solution' is to wrap the SSLSocket in something which adds the necessary methods.

But I'm lazy - in a sense - and decided to try to do this using a Pythonic equivalent to Ruby's missing method machinery.

Python has a rich set of special object methods for defining operations and the like, but it doesn't directly support a missing method call. It does support customizing attribute access so that you can define dynamic attributes - that is attributes which don't actually exist, but have computed values or that you want to create and delete dynamically after compile time.

I was stuck on how to pass arguments to an arbitrary method of a wrapped object instance until I realized that method invocation in Python is a two step process:
  1. look up the attribute
  2. call it with arguments
So the solution is easy:
  1. wrap the object by passing it to the object constructor
  2. create a __getattr__() method which returns getattr(wrapped-instance, method-name)
  3. the Python interpreter then completes the call, with arguments, on the bound method.
If the method exists in wrapped-instance, then the returned value is the method, bound to the wrapped-instance. If not, then wrapped-instance throws an AttributeError exception.

It doesn't interfere with the wrapper's methods because __getattr__() is only called if the attribute is not found.

Besides avoiding writing a lot of method wrapping code, this has the advantage of telling me exactly which methods I have to implement - because when they are called on my wrapper, the wrapped object throws up.

Here's the sample code I wrote to verify the method:
class Floof(object):
var = 'This is Floof'
def __init__(self, var):
self.var = var
def func(self, var):
print('\nThis is Floof.func called on self')
print('Floof.var: ', Floof.var)
print('Floof.self.var: ', self.var)
print('Floof.func(var): ', var)

class Foo(object):
var = 'This is Foo'
def __init__(self, var, floof_instance):
self.var = var
self.floof_instance = floof_instance
def __getattr__(self, name):
print('\nFoo.__getattr__(%s) called' % name)
return getattr(self.floof_instance, name)
raise AttributeError('Foo instance: Attribute %s not found' % name)
def foo_func(self, var):
print('\nThis is Foo.func called on self')
print('Foo.var: ', Foo.var)
print('Foo.self.var: ', self.var)
print('Foo.foo_func(var): ', var)
def wrapped_func(self, var):
print('\nI\'m wrapping Floof.func()')
self.floof_instance.func(var)

floof = Floof('I\'m a Floof')
floof.func('calling floof.func() directly')
foo = Foo('I\'m a Foo', floof)
foo.foo_func('calling foo.foo_func() directly')
foo.func('calling floof.func() through Foo wrapper')
foo.wrapped_func('calling floof.func() via a wrapper method')
foo.flob('calling foo.flob() which does not exist')
Here's the output:

This is Floof.func called on self
('Floof.var: ', 'This is Floof')
('Floof.self.var: ', "I'm a Floof")
('Floof.func(var): ', 'calling floof.func() directly')

This is Foo.func called on self
('Foo.var: ', 'This is Foo')
('Foo.self.var: ', "I'm a Foo")
('Foo.foo_func(var): ', 'calling foo.foo_func() directly')

Foo.__getattr__(func) called

This is Floof.func called on self
('Floof.var: ', 'This is Floof')
('Floof.self.var: ', "I'm a Floof")
('Floof.func(var): ', 'calling floof.func() through Foo wrapper')

I'm wrapping Floof.func()

This is Floof.func called on self
('Floof.var: ', 'This is Floof')
('Floof.self.var: ', "I'm a Floof")
('Floof.func(var): ', 'calling floof.func() via a wrapper method')

Foo.__getattr__(flob) called

AttributeError: 'Floof' object has no attribute 'flob'

function __getattr__ in missingmethod.py at line 52
return getattr(self.floof_instance, name)


See - It works!

Tuesday, July 14, 2009

A Story

Herman

by Mike Howard, (c) 2009

How it Started

The day the Lemmings ran over the cliff, Herman forgot and went to the park.

When it got dark, he went home – but there was nobody there. That’s when he remembered about the Big Cliff Run.

They were all dead now.

That’s the way it is with Lemmings. He remembered.

Then he cried.

The next morning, when he got up, Herman remembered.

He couldn’t understand what had happened. All his life he’d been taught how to be a good Lemming. Maybe he wasn’t a Lemming?

He felt himself. Felt like a Lemming.

He looked in the mirror. Looked like a Lemming.

He was hungry – just like always – and just like a Lemming.

Then how come he’d forgotten the Big Cliff Run? Wasn’t that supposed to be Instinct? Wasn’t it built into his Jeans? (or was it Genes?) Wasn’t it Inescapable and the Source of All Sorrow?

How could he just forget something that was as much a part of him as his nose?

Something was wrong.

So Herman set of to see Marvin – the big, bulky Lemming On the Hill. Marvin knew everything. Marvin knew all the Old Stories from the Old Times by the Old Lemmings who Knew Everything. Marvin was a Great and Good Lemming. The Greatest and Goodest.

Marvin was so Good that he didn’t go on the Big Cliff Run. It made him ever so miserable not to Go – to hold himself back – but he did it for the Good of Every Lemming!

He had to. He couldn’t let Lemmingness be Lost! Someone had to sacrifice for the Good of All Lemmings. Someone had to tell the Stories. Someone had to make sure All the Lemmings made the Big Cliff Run.

It was a Heavy Burden, but Marvin kept the Faith and Did His Duty.

Marvin

Marvin wasn’t happy to see Herman. In fact he frowned and said something Herman hadn’t ever heard before (which was good, because then Herman didn’t understand what it meant and didn’t know how angry Marvin was).

“What happened,” Marvin asked.

Herman blushed under his fur. “I just forgot,” he said.

”!!!!!,” Marvin exploded!!!!

Herman stuttered, “I just woke up and it was so nice and sunny and so I though ‘I’ll go to the park’ and I didn’t think anything else and so I went and it was nice and the birds were out and singing and the grass was soft and . . .” It all came out in a rush.

Marvin glowered.

“Don’t you remember your lessons?” he asked. “Didn’t I tell you to repeat the Lemming Mantra every morning and every night?” “Didn’t I”

“Yes,” said Herman.

“Well? . . .”

“I just forgot. It was such a nice day, . . . and I just forgot,” Herman said lamely.

“Hmmph!!!” said Marvin.

As Herman trudged down the hill he saw Lonny, Marvin’s apprentice and student dragging a big bag of Cheetos up the hill. Marvin was too fat (oops! too important) to go get his own food, so everyone brought him what he needed. Lonny and Marvin were great Lemmings to hold themselves back and stay alive year after year, thought Herman.

Sighing, Herman walked slowly home.

“I’ll remember Next Year,” he though as he sadly trudged along.

Interlude

Over the next year, the few surviving Lemmings rebuilt. They had new babies who quickly grew up and had more. Marvin and Lonny were busy teaching and preaching. Herman attended all the lessons. He worked very hard at fixing himself so he’d be a Good Lemming. He repeated the stories over and over again. He repeated the Lemming Mantra extra times every day.

But Herman loved going to the park and watching the clouds. Every day it was different. Every minute even.

He marked Big Cliff Day on his calender and as it approached, he worked extra hard at thinking Lemming Thoughts. He Felt the Sorrow. He Felt the Doom. He was a Lemming. He knew what had happened last year – he just hadn’t taken it seriously enough. Hadn’t filled his mind with Lemming Thoughts. He hadn’t repeated Over and Over again what a Lemming did and how a Lemming Thought and how Inevitable everything was.

Lemming’s can’t Change! It’s how they’re Built. It’s Innate. It’s Inevitable. It’s who we Are. It’s Lemmingness!!!!

The night before the Big Cliff Run, Herman worked extra hard at remembering. He laid out his special shoes and everything. Then he went to bed.

The next morning was a beautiful day. Herman woke up earlier than usual. Earlier than everybody else, in fact. Since he was early, he decided to go to the Park One Last Time. So he did.

It was so nice. The clouds were making beautiful swirls. It was almost hypnotic. Lovely. Beautiful. Peaceful. Happy.

Herman woke up suddenly.

He remembered – that he’d forgotten again.

He went home.

Everyone was gone and he was alone again.

This time, Herman sat down and shook his head.

“Something isn’t right,” he thought. He started to go to Marvin, but then stopped. He had an idea – well maybe just the start of one.

Could Marvin be wrong? No!

The Idea

Herman thought and thought and thought and thought and then fell asleep. Thinking is tough work and he wasn’t used to it. All his life Herman hadn’t actually ‘thought’. He’d been a Good Lemming and he’d only repeated what he was told. He’d been told Over and Over what to repeat, so he did it. He repeated the Lemming Mantra. He repeated the Song of Sorrows. He repeated the Five Reasons for Melancholy. He kept the Record of His Life in a little book – where he wrote down Every Bad Thing. He read the book from cover to cover every week (it took longer and longer the more he wrote into it – in fact it was getting to where he didn’t have time to do much else)

Herman did his duty and met with the Other Lemmings. They all took turns telling about the Bad Things, reminding each other how terrible life was. Over and over. And when one of them saw something nice and beautiful, they made doubly sure that everyone took a turn telling what was bad about the Nice Thing and how disappointing it would be. They were very, very good at protecting each other against the Unbearable Pain of False Hope.

But as the year wore on, Herman noticed how much work it was to remember to feel bad. He was tired all the time. He didn’t go to the Park as much.

After a while, he found himself just sitting in his chair looking out the window. Not repeating the Mantra. Not reading the Bad Things book. Just sitting and looking.

That’s when he started really thinking.

What if . . . what if . . . what If . . . WHAT IF LIFE REALLY WASN’T SO BAD?

He used to like going to the park. It was fun. It was fun because he wasn’t Thinking the Bad Thoughts. He wasn’t Remembering the Sorrow.

He had the Thought!

“Maybe thinking the Sorrow Makes the Sorrow.”

And then the Discovery.

“Maybe if I don’t think it all the time, it will Go Away”

Interlude (again)

Herman didn’t know what to do. He’d never had a thought on his own before. He worried and worried that it was Wrong. Marvin Thought and Marvin Taught. Lonny will be the new Thinker and Teacher – once he’s ready. “I’m not Smart enough,” thought Herman.

But the Thought wouldn’t go away. It kept coming back.

It was strange because Herman didn’t have to work at thinking this thought. It just kept coming back. Not always the same and not very clear. It was more like a feeling that said “It doesn’t have to be this way” and “That’s a nice thing, let’s think about it for a while”.

It wasn’t so much the Things he was Thinking about, but how the Thoughts felt. Like the nice fluffy clouds and the warm sun. Nice and peaceful. Calm and serene. Not at all strong and hurting – like he was supposed to think.

Herman finally decided he needed to see Marvin.

So he went.

Marvin was even more displeased. He barked and preached at Herman for over two whole hours. He wouldn’t answer any of Herman’s questions - in fact he just ignored them. Then he told Herman to Go Home, Believe, and Behave.

As he trudged home, he say Lonny dragging another bag of Cheetos up the hill to Marvin. Lonny was tired, so he was happy to stop and talk. Herman told him everything he’d tried to tell Marvin. He finished up with “I just don’t understand. I just don’t understand. I just ,” at which point Lonny interrupted him.

“Look, Herman,” he said, “I like you, even though you’ve got some screwy ideas and I wish I could help you. Marvin’s getting kind of old and he just doesn’t have the patience he used to have. But I don’t have the time right now – but if you’d like to borrow my Book, I’ll let you have it for a bit. I’ve got to have it back on Tuesday, but maybe it will help.”

A book? Herman had never heard of the Book before. He knew how to read and write – like all good Lemmings, but he didn’t know there was a Book. He thought Marvin had learned the Stories from his Teacher and his Teacher from his Teacher and like that.

Lonny was holding out the Book, so Herman took it, mumbling something to cover his embarrassment, and scuttled home.

The Book

When Herman got home, he opened the Book and started to read. It made his head hurt because it didn’t make sense.

“I’ll never understand this by Tuesday,” he thought, so he decided to make his own copy. He got to work, copying the Book.

After that, Herman spent every day reading his copy of the Book. He kept looking for what Marvin had Taught him, but it never really seemed to fit.

He read “Our Thoughts make our World, so Choose your Thoughts Carefully” and he remembered how Marvin had droned over and over what thoughts to think. But it just didn’t feel right because the book said, “The Thoughts you hold Dear will Be your Life – so Think Thoughts of Peace and Joy”.

He read and read and thought and thought and became more and more confused. He thought of going to the Great Marvin, but that frightened him. Then he thought of Lonny. Maybe he could explain it.

So, the next day he waited for Lonny to come by on his daily quest for Cheetos for the Great Marvin.

“Excuse me Lonny, I have some questions about the Book”

Lonny stopped and looked puzzled – or maybe was shocked and stopped so he wouldn’t fall down.

“What Book?” he asked.

“The one you lent me and I copied,” Herman replied.

“The one you copied?” asked Lonny. “That’s impossible. There is only one Book. There can’t be two – it’s Forbidden.”

Startled, Herman looked confused. “But I had to copy it so I could get it back to you on Tuesday. I couldn’t understand it that fast.”

Lonny looked worried. “Please don’t tell anyone.” (meaning Marvin) “Please don’t, tell me you won’t.”

“Sure,” said Herman. “But will you help me?”

“With what?”

“Well, I know what I was taught and I know what the Book says and they’re not the same,” Herman began.

“Oh yes they Are!!!” Lonny said. “You just don’t Understand. That’s why there’s only One Book. You have to Taught How to Read It.”

“Will you teach me?” Herman asked.

“I can’t. You’re not the Student – I am,” Lonny said. “I can’t Teach until I’m the Master and the I can Only Teach the Student. That’s how it is.”

“Well, the Book says that the Master taught his students,” Herman began.

“That’s Right,” said Lonny, “and that’s why there can only be One Student and No One can Teach except the Master.”

“But, it says in the Book that the Master taught lots of students - not all of them learned well, of course, but he taught everyone. I don’t understand.”

“That’s right, you don’t. You see, the Master could only really Teach the Students who learned, not the other students. And those other students went around teaching others Wrong Things.”

Herman scratched behind his left ear. “Hmm. So now the Master doesn’t let anybody else Teach or read the Book because they might get it Wrong?”

“That’s Right,” Lonny replied.

“And Marvin knows How to Read the Book because . . .” Herman trailed off, confused.

Lonny sighed and sat down. “Look,” he said, “Marvin’s Master Taught him and he had a Master who Taught Him and so on – all the way back to the Master who Wrote the Book. That’s how Marvin knows how to Read it and How I’m Learning too.” Lonny looked compassionately on Herman. “You’ll just never understand.”

“Will you help me?”

“I’m sorry, but I can’t,” Lonny answered.

“Well, how about just one question then,” Herman asked.

“OK. Sure. But, just one.”

“Marvin teaches us to remember the Bad things, to Write them Down in our book of bad things and to read it every week. And anytime someone thinks that Bad won’t happen, it’s our Duty to remind them of All the Bad things. The Book says that what we Think is our World. Doesn’t that make our World Bad instead of Good?”

Lonny sighed. “See, you just don’t understand. What’s the worst thing that can happen to you?”

“I really don’t know,” Herman said.

“Think about this: suppose you thought you would get a great big Strawberry Shortcake for dinner tomorrow. You’d spend the whole day and night just thinking about how great that would be. You’d plan the whole next day how good it would taste and comfy and fat your belly would be. Can you imagine that?”

“Oh boy, yeah. I can almost taste it now.” Herman started to drool.

“Ok,” Lonny continued, “now think about how disappointed you’ll be when you don’t get it or if it’s filled with ants or the Strawberries are rotten. Think about that.”

“Yech! That’s terrible,” Herman responded. His fur actually started to wilt.

“So you see,” Lonny lectured, “it’s much, much better to never get your hopes up because that makes it so much farther to fall down. You see thinking Bad Thoughts protects use from disappointments like that – and even worse.”

“That’s why thinking Bad thoughts is Really Good,” Lonny finished. “I hope that helps. Do you see why someone has to Teach you How to Read the Book? And why it’s not Good for You?”

Herman just mumbled.

“Well, I hope that helps you – but I have to go about my duties,” Lonny said, standing up and starting to walk away. Then he stopped, “and remember, please don’t tell any anybody.”

Herman started. “Oh, sorry. Yeah, thanks. No, I won’t tell anyone. Bye”

So, Lonny went off to do his important tasks and Herman sat and sat some more. And thought and thought some more. What Lonny said made some sense, but something just didn’t seem right.

Herman went on like this for a long time. Reading the Book and thinking about what he had been Taught and what Lonny had said. His fur got dull.

One night, very very late, Herman snapped. “I’ll never make sense of it,” he said out loud.

From that night on, Herman quit trying to be a good Lemming. He stopped doing what he’d been Taught. He forgot about the Book. He’d decided to just live the best he could.

Herman no longer wrote in the book of Bad Things. He no longer recited the Lemming Mantra. He began to forget the teachings. He didn’t read the Book – in fact, he forgot about it.

Each day Herman went to the Park. He looked at the trees and the grass. Slowly, the Bad Thoughts didn’t come so often. They didn’t Stay so Long. He didn’t feel so bad. The Great Sorrow lifted.

The Realization

One day Herman saw the Book. Wrinkling his brow, he picked it up and opened to a page at random. It made sense. It was simple. Why hadn’t he understood it before? You really are what you Think. And you can Chose what you Think.

Lonny was Wrong. Marvin was Wrong. Thinking Bad Thoughts didn’t Protect you, it made a Bad World. If the Strawberries are rotten, then just forget about it: there will be fresh ones some day.

The Answer is to Forget about the Bad and Think about the Good!

Herman couldn’t wait to tell everyone.

He ran outside.

The End

Well, I won’t tell about how everyone laughed at him. How no one believed. How Marvin was even More than Displeased (luckily he was too fat to move or he might have squashed Herman flat). Or Lonny’s look of pity.

Nobody believed him. After all, the were Lemmings and he was only a lemming – not a Real Lemming At All. He didn’t understand the Burden that came with Lemmingness – the Great Sorrow – the Inevitability of the Great and Senseless Death, crushed at the bottom of the Big Cliff. He didn’t understand.

On Big Cliff Day, Herman walked out to the side of the road and shouted to them as they ran by: “You don’t have to go! Just go to the Park! Just Forget the Bad Stuff! You have a Choice!”.

The ones who heard him clapped their front paws over their ears so they couldn’t hear – which made it very hard to run with their noses in the dirt. But they didn’t care. They wrapped their misery around themselves proudly and – bleeding and with broken teeth – plowed on toward the cliff.

All the while, Marvin sat on his hill – praising their Lemmingness, while he ate his (final) bag of Cheetos. He would not live to see another Big Cliff Day. It was now Lonny’s turn to be the Master.

It slowly dawned on Herman that they would defend their misery with their last dying breath rather than change their minds. “They think they are Lemmings,” he said, softly, under his breath, “They think that if they change their minds about something, they won’t be themselves.”

So, Herman walked to the Park and watched the day go by.

He was sad.

Then he realized that, while he now knew, he couldn’t help them. They didn’t want a better life. They wanted to feel secure in the life they already had – so they only listen to Lemmings who will tell them that they are right. Lemmings who confirm their misery. He was just a lemming.

But that’s OK.

So, he decided – right there – that if anybody asked, he would tell them, but that he wouldn’t bother anybody any more. (the thought of those Lemmings pushing their noses in the dirt was just too much to bear, even though it was kind of funny)

The End

Why XHTML is Wrong and HTML 5 is Right

I'll admit it: I fell for the XHTML hype and I've tried to convert myself to writing XHTML rather than HTML. The argument sounded good, but . . . I was wrong.

Then about a month ago I read almost all of the HTML 4.01 spec - not the digestions that you find in books like Everything You Ever Wanted to Know to Be an HTML Expert!!!!! in 5 minutes a Day!!!. I read the real spec. I actually spent about 1/4 of my time reading the DTD.

[I thought I needed to write an HTML parser in PHP - but that was just another programming hallucination which I solved in a much simpler, but vastly more limited and restrictive way]

Anyway, while doing that I realized that writing an HTML parser is Hard. Whereas writing an XML parser is EASY.

And that - in a succinct nutshell - is the only argument in favor of XHTML.

And, along with everybody else, I blindly followed that Pundant-centric reasoning.

But lets think about it: How many times do you Write an HTML Parser? For most of us, it's Zero.

OK, how many times do you have to read and understand HTML? For most people it's still Zero, but for a Lot of Us it's 'All the f****ing Time!'.

So - should (X?)HTML be Human Parse-able or Machine Parse-able?

That is: Should (X?)HTML be Easy for Us to Read or Easy to Write simple Parsing Programs for?

If you don't know the answer to that one, ... I was about to write something insulting, but that's not fair. Let's look at some history:

1. RPN [that's Reverse Polish Notation]. It's easy to parse, but hard to read. It has a long history, but even the original HP calculator [showing my age] which was a commercially successful RPN calculator had to succumb and convert to algebraic notation. (and as far as I know, the only commercially successful one)

2. The FORTH family of languages. Apparently this never really dies. FORTH was a stack based language which is really easy to write a parser for, but almost impossible to understand. Why? All the operations either push something on the stack or pop some stuff off, transform it and then push it back. Us Humans are NOT Stack Machines, so we can't track the state of the Stack.

3. Postscript - Same problem as FORTH, but it's successful as an intermediate language to describe a rendered document. Nobody actually programs in Postscript - we all use tools which output to Postscript and then feed that to a Postscript interpreter. [anybody every try to read that stuff?]

4. Etc

Point Made?

But, you say, HTML Sucks!!!

Right!!!

Lets get behind HTML 5 and Fix it. XHTML Sucks too.

Friday, May 1, 2009

PHP, Language Design, and Confusion

References

When I wrote some iterator code which returns a reference to an internal array in an object I discovered an interestingly easy to introduce bug. Without getting into the details, I left out a '&' and clobbered my underlying data structure.

Here's the offending loop:

for ($row=&$obj->firstRow();$row;$row=$obj->nextRow() {
other stuff
}

where firstRow() and nextRow() are iterator methods defined on the object which do exactly what they sound like. [oh yeah, they both are defined as returning references]

Spot the bug?

I left off the '&' between the '=' and the '$' in the update part of the for loop. This causes $obj->nextRow() to overwrite the firstRow.

Strangely enough, if I wrote this in C I wouldn't have made the error. The equivalent C code would be:

for (ptr=*obj->firstRow();ptr;*ptr=*obj->nextRow() {
*ptr-> whatever
}

Notice that I have to explicitly dereference the pointer ptr in order to clobber the initial element.

So this covers PHP and the Confusion.

Here's where the Language Design comes in:

C makes an syntactic distinction between accessing a pointer and dereferencing it.

PHP doesn't. If a variable contains a reference then assignment deferences it implicitly and silently.

Consequently, you can't really understand a chunk of PHP without reading all the definitions of the variables which preceed the chunk you're looking at. This is a bad thing and it violates that tried, true and mostly forgotten maxim of 'code locality'.

This also leads to a lot of bugs - and 'Bogus' bug reports - involving references. [just check out bugs.php.net - 655 Bogus bugs relating to 'reference']

I think PHP would be a lot easier to understand if there was a syntactic difference between assigning to a variable and assigning to the referrant of a variable.

Practically speaking, I don't think it will happen because of 'backward compatibility' and 'NIH'.

What I'd like to see is a dereference prefix which is accepted in PHP 5.x and becomes mandatory in PHP 6. There should also be a warning option which generates a warning if a variable containing a reference is assigned value and the variable is not prefixed.

Here's my first (and only) choice:
&$foo

This is currently used to denote a reference when used on the right side of an assignment. It should be a simple matter to extend this to dereferencing on the left side and in expressions.

Thursday, April 23, 2009

Efficient Code - PHP

I guess I'm a nut about execution speed. Well, not if it gets in the way of clarity, but I hate writing slow code.

But . . . always a 'But' . . . it's hard to know what is slow and what isn't. Seems like it should be easy, but I never know until I measure.

Here's an example:

The PHP manual page for preg_split() says that you shouldn't use it unless you need the flexibility of regular expressions. You should use explode() or str_split(), because they are simpler and therefore faster. I believed that.

Then I thought I'd like to see how much faster, so I wrote a couple of loops which split a comma separated string with both preg_split() and explode() to see how much I was really losing.

Here's are the average times of 10 passes through 1,000,000 splits using each:
preg_split() - 3.51239209175 seconds
explode() - 4.33661820889 seconds

preg_split() is about 23% Faster than explode().

I'm still not sure I believe it. Here's the code - so you can try it yourself.

And, please let me know if you see anything I did wrong.


$str = 'this, is , a , string, with , commas, in ,it';
$count = 1000000;
$passes = 10;
$times = array(xdebug_time_index());
$avg = array('preg' => array(), 'explode' => array(), 'map-explode' => array());
function dt($offset = 1)
{
global $times;
$last = count($times) - 1;
return $times[$last] - $times[$last - $offset];
} // end of dt()

for ($pass=0;$pass<$passes;$pass++) {
echo "Pass $pass\n";
$times[] = xdebug_time_index();
for ($i=0;$i<$count;$i++) {
$v = preg_split('/^\s*,\s*/', $str);
unset($v);
}
$times[] = xdebug_time_index();
$avg['preg'][] = dt();
echo "$count preg_split's(regx, str): " . dt() . "\n";

$times[] = xdebug_time_index();
for ($i=0;$i<$count;$i++) {
$v = array_map('trim', explode(',', $str));
unset($v);
}
$times[] = xdebug_time_index();
$avg['map-explode'][] = dt();
echo "$count array_map(trim, explode(',', str)): " . dt() ."\n";

$times[] = xdebug_time_index();
for ($i=0;$i<$count;$i++) {
$v = explode(',', $str);
unset($v);
}
$times[] = xdebug_time_index();
$avg['explode'][] = dt();
echo "$count explode(',', str): " . dt() ."\n";
}

echo "Averges\n";
foreach ($avg as $key => $ar) {
echo "$key average of $passes trials of $count splits: " . array_reduce($ar, create_function('$a,$b', 'return $a+$b;'), 0) / count($ar) . "\n";
}

Tuesday, April 21, 2009

Documentation - It's Part of the Design Process

I was documenting some code I just wrote to create a packing list for an internet store we'll be opening soon. It's kind of a hairy problem to come up with a semi-optimal allocation of goods into boxes - but absolutely necessary to do right if you want to accurately estimate shipping and keep costs down.

It took me about a week to really get my head around the problem so that I could solve it. The actual solution now looks pretty obvious - but it won't in six months or a year.

So I document all my code - profusely, but compactly.

A Strange Thing happened while I was writing the doc: I found and corrected a whole bunch of bugs - both coding bugs, 'bugs of omission', and design bugs. That's when I realized that I use the very act of documenting code as part of my design iterations.

So, I started thinking about what I do and why I do it the way I do - which is (as usual) a bit contrary to 'accepted best-practice'.

First of all, there are basically Three Types of documentation in code:
  • The Code Itself
  • Comments
  • Doc
The Code: My code is semi-self documenting. I do the usual stuff: identifiers are semantically related to their use [function save_data() - etc]; use of white space; consistent breaking of long lines and breaks at binary operators; consistent indentation.

But you can do more.

Simple, direct, and consistent code does wonders for readability. Consistency in within-loop logic [use if elseif elseif ... or switch case case default or if () { stuff; continue;} if () { stuff; continue;}]. I don't think it matters what style you use as much as consistency and simplicity.

Small is usually good as well. We used to talk about 'locality of code' meaning that our code was written in chunks which weren't 'too long.' 'Too long' usually really meant 'it fits in the window of my editor' so that you can stare at it and understand the whole thing. There's also 'too short', but the Ruby on Rails guys have the market on that with their obsessive DRYness.

Anyway, the Code itself is always the final authority on what the program does - so it's the most authoritative documentation.

Do yourself a favor and make it as readable as possible - you might have to fix it later.

Comments: Helpful hints which make the Code more readable. Some (IMHO) IDIOT came up with the idea that Comments Must be Set Off with Big Boxes of Stars or something like that.

Why?

Well, I'll tell you.

We used to write programs on Paper and then punch them into Punch Cards and run them through a Card Reader and then get a Big Wide Printout on Greenbar paper and do our debugging at a desk. We'd page through this Paper with a pencil and write revisions to our programs. The paper was 14 inches wide [132 columns] and 11 inches long [60 lines at 6 lines per inch with 1/2 inch top and bottom margins].

In those days it made sense to Mark Off the comment blocks. It made them easy to spot - both for someone reading the code and for the Boss as he walked by your desk. [You'd get demerits for not commenting your code, so it was a good idea to make it easy for him to see].

Things have changed. You can easily flip around in an on-screen editor, but you can't very well write on it. It works a lot better if the information in the editor window is very dense. The way to do that is to write meaningful comments with as little excess white space as possible.

Now I use blank lines to separate logically disjoint chunks of code and preceed them with comments - if appropriate - but I don't waste a lot of screen space blocking off comments.

I think we need a very high Signal to Noise ratio - and big blocks of comment markers are just noise.

I don't use PHPDoc because those guys seem to think that the Documentation and the Comments are More Important than the Code itself. I don't. I want as much Code on the screen as I can get if it is blocked out so that the logic is obvious and commented as needed.

The Doc: The Doc is narative description of what the code does, how it does it, how to use it, calling parameters, etc etc. Can't live without it.

I've experimented with all kinds of methods. I used to write UNIX style man pages - and they are still a really good format. But now days, everything has to be in HTML, so I've changed.

There are a lot of Documentation Project things - like PHPDoc - but I don't use them.

I think they're too complex and they are usually specific to one language. Besides, they just aren't necessary for good, clear documentation.

Like many people, I work in several languages [one reason I'm bald] so I need a multi-lingual documentation system. I also want something which doesn't take up a lot of space in my program files and which is really simple. Here are my 'Must Haves':
  • Must allow the Doc to be imbeded in the same file with the code
  • Must produce HTML
  • Must be flexible
  • Must be very Simple
I couldn't find anything which didn't take multiple days [or weeks] to learn, so I wrote up something to pull Textile makeup out of a simple text file and that's what I use.

Frankly, I'm smart enough and write well enough to describe what some code does. So are you. Just pretend you're writing for yourself about a year from now when you have to find a bug and fix it in record time - or you'll be fired.

What kind of Doc do you want then? Clear, correct, self-contained without a lot of references jumping around, etc etc. Write something which will save your butt when the crunch comes and you're fine.

Design?

So what does this have to do with Design?

As I write my Doc - part 3 of the Code Documentation - I find myself thinking about how the whole thing works and how understandable it is. It can get kind of contemplative.

Another thing which happens is that I find myself wondering why in the hell I wrote something and what it really does - so I go through the logic again.

There's a different perspective and attitude when reading code to understand and clearly document it than there is when writing it initially. At least there should be.

When I write code, I get involved in the microscopic detail, in solving the problem, in creating the logic and the loops.

When I read code to document it, I get involved in the overall strategy of the solution and the clarity of the logic. It's just different.


Anyway, that's how I think about Documenting code now days.

Oh, if you're interested in my document extractor - I'm planning on dumping on the download page of http://www.clove.com. It's written in Python, so it should run about anyplace.

Tuesday, March 3, 2009

How can you screw up XML? Part II

XML is overly wordy. It has lots of required text.

I think the theory is - if there really is one - that longer words make the computer-code more easily understood. That was certainly true when we were using FORTRAN with 6 character variable names and Basic with 2 characters - but there's a limit. That limit is probably around 32 characters - 1/3 of a screen line - at which point you need some white space and punctuation.

XML doesn't provide for white space, just punctuation. As (semi-)intelligent creatures, we need white space to clump symbols into recognizable things. It seems to be how we parse. Computers don't distinguish, so . . . They can read XML, but we can't. [maybe it's really a plot by the machines and W3 are a bunch of Cylons?]


So maybe the extra stuff is supposed to make XML more reliable? To get there, we have to talk about two slightly different subjects: Bandwidth and Information Theory.

Bandwidth measures how fast we can transmit messages. The bigger the bandwidth, the faster the electronic wiggles are and so the more Bits we can represent in a given chunk of time - say a second. Big Bandwidth is Good.

Effective Bandwidth is the fraction of the real Bandwidth you get to use for your stuff - the content you want to see, transmit, or use - read streaming video from hulu.com. The more non-Content characters in the Protocol [read XML] used to encode your 'stuff', the lower your Effective Bandwidth. [You pay for Real Bandwidth, but you Get Effective Bandwidth - it's kind of like sales tax or Net Income after Income Tax]

Information Theory studies how to transmit Information in the presence of Noise. It turns out that you can always get your message across accurately - most of the time - if you use a fancy enough code. A Code takes a simple message and adds a lot of extra bits which allow the receiver to tell if the message was messed up [received a 0 where there should have been a 1] and to reconstruct it. When you have more noise, you have to add more reconstruction bits. That makes the message longer. So Information Theory says - if you want reliable communication, you have to allocate some of your Bandwidth to these reconstruction bits - called Redundancy - so that your Effective Bandwidth is lower than your actual Bandwidth.

Now let's apply this to XML.

XML adds extra stuff to create a rigid structure for you message. This has Nothing to do with Information Theory because XML is transferred over TCP - which is a Lossless Protocol - meaning that, if the message gets there At All, it's guaranteed to be OK. There's No Noise on TCP. [The TCP protocol has already eaten up the Bandwidth required by Information Theory to get your 'stuff' to you]

The XML extra stuff is there so computer programs can easily parse the message and use it's Content once it gets there.

XML's 'extra stuff' shares the Same Bandwidth as the Content inside the XML message.

I ask you: what's more important: the Content or the 'extra stuff'?

Personally, I think the 'extra stuff' should be as small and efficient as possible so we can use as much of the Bandwidth for Content.

W3 must think that the 'extra stuff' is more important than content because they make their protocols as bulky as they can.

Don't believe me? go to www.w3.org and read some of their specs - try name spaces or the RDF spec or just about anything. All Structure with minimal space for Content.

Why do we put up with this?

How can you screw up XML?

XML is about the simplest thing around. How can you screw it up?

I'm writing some metadata-extraction-from-image code and started looking at Adobe's XMP.

XMP is written using the W3's RDF spec - [almost wrote RDF framework, but that would then be Resource Description Framework framework (RDF2), which might not be the same thing].

RDF defines a 'Framework' for writing machine parseable statements of the form:

Tim has a bike.

RDF calls:

  • Tim the subject

  • has the predicate

  • bike the object
It makes you write URI's for both the subject and predicate - and have translation syntax between real words and the URI's.

Here's an example from the RDF Primer. It says: 'http://www.example.org was created on August 16, 1999'
<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:exterms="http://www.example.org/terms/">
<rdf:Description rdf:about="http://www.example.org/index.html">
<exterms:creation-date>August 16, 1999</exterms:creation-date>
</rdf:Description>
</rdf:RDF>
The RDF is 5.9 times LONGER than the English Language Sentence. That's an increase
in text of about 83%. Or, to put it another way, a BANDWIDTH UTILIZATION of about 17%.

For What Gain?

Nothing. And it takes them 6 LONG RFC style documents to define this messs. That's 6 Long, Boring documents with much repetition and pedantic phrasing with many MUST's and SHALL's and MAY's.

But it can be parsed by a machine - if you can understand the spec well enough to write the code.

Why take something so simple and make it so incomprehensibly complex?

But - Believe it Not - I digress.

XMP is written using RDF [why re-invent the wheel when you can use somebody else's debacle and make it worse].

I'm not going to get into XMP - but at first glance it looks like they're using attributes for data, XML entities for data, and RDF nested structures for data - with NO obvious logic as to when and where these choices are made. To further mess it up, everything uses XPATH name spaces - which my XML parser translates back to URI's [which point to nothing, but are long and look cool], like a 'good parser should' - which obfuscates the already obfuscated and bulks out the fluff to content ratio admirably.

Yech!!!!!!!

Here's how I think XML intended to encapsulate a website's creation date:
<site-info site_name="www.example.com">
<creation-date>August 16, 1999</creation-date>
</site-info>
It's machine parse-able. It's (almost) human readable. It only wastes 50% of the bandwidth - as opposed to 85% using RDF.

How about JSON - where the entire spec fits on one web page:
{
"site_info": {
"site": "www.example.org",
"creation_date": "August 16, 1999"
}
}

Tuesday, February 17, 2009

PHP Input Cleaning

I've been buried in my be-all, end-all, does-everything CMS / e-commerce site.
I thought I'd come up for err (or air) and here's what happened.

Below is an object which does lazy input cleaning of GET and POST data.

You use it by creating a new object [surprise].

You access cleaned data as attributes of an object instance.

All the cleaned data is cached in a class variable, so you can either pass around
a global or create a bunch of instances and everything will work - within the same
instance of the PHP program, of course. Once the Object
is required, it acts kind of like a Singleton Pattern.

You can restrict the query sources to GET, POST or REQUEST and control how it
reacts to undefined query parameters.

The documentation is in a comment at the top in Textile.

In some ways, I think this is kind of neat, but on the other hand it's
really depressing how much effort goes into something with such a simple
objective. Seems like there should be an easier way.

Cheers,


<?php
/*
#doc-start
h1. RequestCleaner

Created by on 2009-02-14.
Copyright (c) 2009 Clove Technologies, Inc. All rights reserved.

h3. Usage

Create a new request cleaner:

$rc = new RequestCleaner(sources, use_modes, error_mode) - where:

* sources is a comma separated string OR an array with one or more of: GET, POST, REQUEST
* use_modes is the bitwise OR of
** RequestCleaner::USE_HTMLENTITIES - process each parameter with _htmlentities()_
** RequestCleaner::USE_HTMLSPECIALCHARS - process each parameter with _htmlspecialchars()_
** RequestCleaner::USE_NL2BR - process each parameter with _nl2br()_
* error_mode - determines how the beast responds to errors - such as non-existing query parameters.
Use one of:
** RequestCleaner::RETURN_NULL - returns NULL
** RequestCleaner::RETURN_FALSE - returns FALSE - can be distinguished from NULL by '===' and '!=='
** RequestCleaner::THROW_EXCEPTION - throws an exception

For example:

$rc = new RequestCleaner('POST', RequestCleaner::USE_HTMLENTITIES | RequestCleaner::USE_NL2BR,
RequestCleaner::THROW_EXCEPTION);


To get a 'cleaned' query parameter, use '$rc->parm', where _parm_ is the parameter name.
For example, if 'foo' is a POST parameter, then '$rc->foo' will return the value of 'foo'
from after running the 'cleaning' routines on it.

h3. Fine points

All Cleaned data is cached in a RequestCleaner Class Variable, as are the sources, use and error
modes. This means that a top level PHP program can set up the method of cleaning and
allowed sources and all included code which uses _any_ instance of a RequestCleaner
will use the same methods and cache.

Consequently, you can either create a RequestCleaner instance at top level OR in any included or
required file which needs to access query parameters. It doesn't matter.

Attempts to access undefined attributes generate an error - as specified by _error_mode_.

Query parameters which return arrays - as in <input type=... name="foo[]" ...> -
are turned into arrays of cleaned strings which can be processed using normal loops
and array_...() functions

#end-doc
*/

/**
* RequestCleaner(sources = NULL) - where sources defines a comma separated list of Super Globals
* for attribute values. Legal names are: POST, GET, and REQUEST
*/
class RequestCleaner
{
const ERROR_MODE_MASK = 3;
const RETURN_NULL = 1;
const RETURN_FALSE = 2;
const THROW_EXCEPTION = 3;
const NORMAL = RequestCleaner::RETURN_NULL;

const CLEANER_MODE_MASK = 0x07; // IMPORTANT: Change This as you ADD USE_modes
const USE_HTMLENTITIES = 1;
const USE_HTMLSPECIALCHARS = 2;
const USE_NL2BR = 4;
private static $use_modes = array(
RequestCleaner::USE_HTMLENTITIES => 'htmlentities()',
RequestCleaner::USE_HTMLSPECIALCHARS => 'htmlspecialchars()',
RequestCleaner::USE_NL2BR => 'nl2br()',
);
private static $error_modes = array(
RequestCleaner::RETURN_NULL => 'return NULL',
RequestCleaner::RETURN_FALSE => 'return FALSE',
RequestCleaner::THROW_EXCEPTION => 'throw Exception'
);
private static $sources = NULL;
private static $source_names = array();
private static $error_mode = NULL;
private static $use_mode = NULL;
private static $cache = array();
function __construct($sources = array('GET', 'POST'), $use_mode = RequestCleaner::USE_HTMLENTITIES,
$error_mode = RequestCleaner::RETURN_NULL)
{
if (!RequestCleaner::$sources) {
if ($sources) {
RequestCleaner::$sources = array();
if (is_string($sources)) {
$sources = preg_split("/,\s*/", trim($sources));
}
foreach ($sources as $src) {
RequestCleaner::$source_names[] = $src;
switch ($src) {
case 'POST':
RequestCleaner::$sources[] = $_POST;
break;
case 'GET':
RequestCleaner::$sources[] = $_GET;
break;
case 'REQUEST':
RequestCleaner::$sources[] = $_REQUEST;
break;
default:
throw new Exception("RequestCleaner::__construct($sources): Illegal Source: $tmp");
}
}
} else {
RequestCleaner::$sources = array($_POST, $_GET);
}
if (RequestCleaner::CLEANER_MODE_MASK & $use_mode) {
RequestCleaner::$use_mode = RequestCleaner::CLEANER_MODE_MASK & $use_mode;
}
if (RequestCleaner::ERROR_MODE_MASK & $error_mode) {
RequestCleaner::$error_mode = RequestCleaner::ERROR_MODE_MASK & $error_mode;
}
}
}

private function stringCleaner($x)
{
if (RequestCleaner::$use_mode & RequestCleaner::USE_HTMLSPECIALCHARS) {
$x = htmlspecialchars($x);
}
if (RequestCleaner::$use_mode & RequestCleaner::USE_HTMLENTITIES) {
$x = htmlentities($x);
}
if (RequestCleaner::$use_mode & RequestCleaner::USE_NL2BR) {
$x = nl2br($x);
}
return $x;
} // end of arrayHelper()

private function useModesToString()
{
$ar = array();
foreach (array(RequestCleaner::USE_HTMLENTITIES, RequestCleaner::USE_NL2BR,
RequestCleaner::USE_HTMLSPECIALCHARS) as $mode) {
if (RequestCleaner::$use_mode & $mode) {
$ar[] = RequestCleaner::$use_modes[$mode];
}
}
return implode(',', $ar);
} // end of useModeToString()

public function __toString()
{
return "RequestCleaner: examining " . implode(', ', RequestCleaner::$source_names)
. " Using " . $this->useModesToString()
. " / Exit Mode: " . RequestCleaner::$error_modes[RequestCleaner::$error_mode];
} // end of __toString()

public function __get($name)
{
if (array_key_exists($name, RequestCleaner::$cache)) {
return RequestCleaner::$cache[$name];
}
foreach (RequestCleaner::$sources as $source) {
if (array_key_exists($name, $source)) {
$val = $source[$name];
if (is_string($val)) {
return (RequestCleaner::$cache[$name] = RequestCleaner::stringCleaner($val));
} elseif (is_array($val)) {
return (RequestCleaner::$cache[$name] = array_map(array('RequestCleaner', 'stringCleaner'), $val));
}
}
}
switch (RequestCleaner::$error_mode) {
case RequestCleaner::NORMAL: return NULL;
case RequestCleaner::RETURN_FALSE: return FALSE;
case RequestCleaner::THROW_EXCEPTION:
throw new Exception("RequestCleaner::__get($name): Value Not Defined");
default:
throw new Exception("RequestCleaner::__get($name): ERROR: Value Not Defined / Illegal Error Mode");
}
return NULL;
} // end of __get()

public function __set($name, $value)
{
switch (RequestCleaner::$error_mode) {
case RequestCleaner::NORMAL: return NULL;
case RequestCleaner::RETURN_FALSE: return FALSE;
case RequestCleaner::THROW_EXCEPTION:
throw new Exception("RequestCleaner::__set($name, $value): Setting Attributes Not Allowed");
default:
throw new Exception("RequestCleaner::__set($name, $value): Setting Attributes Not Allowed / Illegal Error Mode");
}
} // end of __set()

public function __unset($name)
{
switch (RequestCleaner::$error_mode) {
case RequestCleaner::NORMAL: return NULL;
case RequestCleaner::RETURN_FALSE: return FALSE;
case RequestCleaner::THROW_EXCEPTION:
throw new Exception("RequestCleaner::__unset($name): Unsetting Attributes Not Allowed");
default:
throw new Exception("RequestCleaner::__unset($name): Unsetting Attributes Not Allowed / Illegal Error Mode");
}
} // end of __set()

public function __isset($name)
{
if (array_key_exists($name, RequestCleaner::$cache)) {
return TRUE;
}
foreach (RequestCleaner::$sources as $source) {
if (array_key_exists($name, $source)) {
return TRUE;
}
}
return FALSE;
} // end of __isset()
}

// end class definitions

?>


Why did I post this dramatic exposition of Programming Prowess?

No particular reason, just felt like it.