The War on Terror, for Unix geeks

Via Tim Bray: The War on Terror, as viewed from the Bourne shell.

Posted in General | Comments Off on The War on Terror, for Unix geeks

A platform conversation

A living room. Windows user 1 sits at a desk, wrestling with the computer. Windows user 2, Mac user and Mac user’s girlfriend lounge nearby.

Windows user: (groans and makes a pained expression)
Mac user (to Windows user 2): Look at him. That’s the expression I associate with Windows.
Mac user’s girlfriend: That’s the attitude I associate with Mac users.

Posted in General | Comments Off on A platform conversation

Passing arguments from a shell script to a command

For future reference: In a shell script, if I have to pass an unknown number of command line arguments to another command, and some of the arguments might contain spaces, here’s how to do it:

#!/bin/sh
othercmd --foo --bar "$@"

The magic “$@” invocation in double quotes does the trick. The more natural “$*” doesn’t work because it doesn’t keep arguments with spaces intact.

(Thanks go to Paolo who actually figured it out.)

Posted in General | 1 Comment

On PHP and compatibility

This is a rant about PHP. PHP is one of these languages that grew out of nothing, by accretion rather than design. It shows in many places, and among them are PHP’s OOP features.

The thing is, whenever you do something with a variable – assign it to another variable, pass it into a function, return it from a function, a copy is created. This is fine for normal variables, but a huge problem for objects, because almost always you want to pass around references to the original object, and not a copy. You can override PHP’s default behaviour by putting “&” characters into various places. So when you assign an object to another variable, you do:

$a = new Foo();
$b = &$a;

When you pass objects into functions or methods, you do:

function foo(&$a) {
    …
}

And when you return objects from methods, you do:

return &$a;

This is annoying, and dangerous, because you get strange and hard-to-find bugs when you forget the “&” in one place. But you get used to it, and with a bit of discipline, everything works pretty well. Actually, most of PHP’s shortcomings can be overcome with discipline and experience. And then, it’s quite an enjoyable language, because it gives you so much bang for your brain cycle.

This rant is not about PHP’s OOP feature. It’s about a recent change in PHP 4.4. When one of my hosting providers silently upgraded from PHP 4.3.something to PHP 4.4.0 the other day, some of my scripts started to break. Well, not ‘break’ in the strictest sense, but there were dozens of lines of obscure warning messages on top of every web page generated by the scripts.

What happend was that something about this construct became illegal:

function &createFoo() {
    return &new Foo();
}

you’re apparently supposed to fiddle with the ampersands, or put them somewhere else, I’m not quite sure. Anyway, any code that does something similar breaks with the upgrade. And that’s pretty much every nontrivial PHP application in the world.

Now there are some reasons why I shouldn’t be angry with the PHP guys about this:

  • The warnings, once noticed, were easily fixed
  • The change is supposed to fix a nasty memory leakage problem in prior PHP versions
  • It probably wouldn’t have happened if I was on PHP 5 yet, which solves most of the OO issues
  • Part of the blame lies with the webhoster for doing the upgrade without researching the issues and warning their customers first
  • Part of the blame lies with me because I’ve enabled all warning messages even on a production system

But still. It’s not encouraging when your programming language silently changes semantics in a nontrivial way with a minor version upgrade, especially in an ecosystem where the developer often has no control about the version he has to use. This leaves a very bad taste with me.

Posted in General | Comments Off on On PHP and compatibility

Test post with RapidMetaBlog

And then there’s RapidMetaBlog, which has more features than DashBlog, but is not as slick. It also wastes lots of screen space, which is my main gripe with certain widgets.

Posted in General | Comments Off on Test post with RapidMetaBlog

Test post with DashBlog

I’m just playing with some Dashboard widgets. If this shows up in my blog, then posting to WordPress via the Blogger API with DashBlog works. Heh.

Posted in General | Comments Off on Test post with DashBlog

John Cowan on the history of the DNS

John Cowan tells the history of the Domain Name System. Interesting – it never really occurred to me that there was an age before DNS, and that the internet could have never become what it is today without it.

Posted in General | Comments Off on John Cowan on the history of the DNS

A dream fragment

I’m in an old study. There’s a closed book on the desk, an old, massive tome, richly adorned and plated with golden ornaments. A fine work of craftsmanship. Its pages had been left blank by its ancient creator, and were covered with writing only recently. I know it’s the diary of German rock star Herbert Grönemeyer.

I open the book and turn the heavy, yellowish pages. They are densely covered with handwriting. Different shades of red and violet colored pen, small rounded letters, carefully executed. It looks like a teenage girl’s handwriting.

Interestingly, everything is written in HTML. The short paragraphs are wrapped in <p> tags. Attributes for font size, width and margins are neatly spelled out.

This strikes me as a perfectly sensible thing to do. Grönemeyer was aware that he might want to publish his writing on the web at a later time.

(I sleep on and remember only this bit of the dream.)

Writing this down, I can’t help but wonder if the markup validated.

Posted in General | Comments Off on A dream fragment

Bristol photos

Anja took some photos when she came over to Bristol to visit me last weekend.

Posted in General | Comments Off on Bristol photos

The semantic web and languages that are not English

I didn’t come across this recent addition to HP Lab’s list of technical
reports
before: An
Introduction to the Semantic Web, Considerations for building
multilingual Semantic Web sites and applications
by Jeremy Carroll.

If you read my blog, then you probably want to skip the “Introduction to
the SW” part. The rest of the report is a highly focused look at the
issues involved in building semantic web applications in any language
that is not English: Unicode, language tags on literals and embedded
XML, URIs vs. rdfs:labels, and IRIs.

The paper also mentions a feature introduced to RDQL in Jena 2.2:
the langeq operator. It is used to filter literals based on
their language tags, which is useful if your RDF data contains literals in multiple languages.

langeq can deal with subtags, that is, asking for
labels in German (tag de) will also give you labels in German
as used in Switzerland and as written using the spelling reform
beginning in the year 1996 C.E. (tag de-CH-1996.)

Example usage:

SELECT ?resource, ?label
WHERE (?resource, rdfs:label, ?label)
AND (?label langeq 'it')

The current SPARQL working draft doesn’t have a
facility like this, but thanks to SPARQL’s powerful expression system,
you can emulate the same thing:

SELECT ?resource ?label
WHERE {
    ?resource rdfs:label ?label
    FILTER REGEX(LANG(?label), '^it(-|$)', 'i')
}

What’s going on here? LANG(?label) gives the label’s language
tag. It is matched against the regular expression ^it(-|$),
which matches the string it and any string that starts with
it-. The 'i' modifier to the REGEX function
makes the match case-insensitive, as required by RFC 3066 and its
replacement-in-progress, draft-phillips-langtags.

Jeremy also points out yet another ugly wart of RDF(/XML): Language
tagging is inconsistent for XML literals. To tag a plain literal, you
put an xml:lang attribute on its property element. If you do
the same for an XML literal, the language tag will be ignored. Instead,
you have to put the attribute onto some element within the XML literal.

IRIs (Internationalized Resource Identifiers) are yet another
interesting addition to the semantic web acronym soup. The next time
someone you know tries to understand the differences between URLs, URIs
and URNs, just mention that “they soon will all be replaced by IRIs
anyway.” This is a great way to keep sane people away from our line of
work.

Back to Jeremy’s report. I enjoyed this quote:

If you are asked to help with production of a
multilingual Semantic Web application you will be asking tool developers
for new features, you will be pushing at the boundaries, and finding
problems in the specifications – budget accordingly

Very true. But the same applies to unilingual semantic web applications.

Posted in General, Semantic Web | Comments Off on The semantic web and languages that are not English