cygri’s notes on web data

The secret to getting absolutely everything you want

Posted on September 11, 2005 by Richard Cyganiak

The only thing you have to do to get whatever you want, is to be willing to do whatever it takes!

Both a truism and a profound wisdom. Paraphrased from here, and the one-sentence summary of this book.

Posted in General | Comments Off

Understand the downsides, and then do it anyway

Posted on September 11, 2005 by Richard Cyganiak

A paragraph from this lengthy post by Philip Eby:

I read an interesting book many years ago that summarized what psychologists knew about people who successfully undertook major changes like quitting smoking or changing their eating habits. As it turns out, the key success criterion in the phase just before taking action was that people had to consider the negative impacts of the change they were contemplating, and decide they were okay with what they were giving up by changing. Before that phase, it was important to know about the benefits of the change, but for changes to stick, it seems we have to understand what the downsides are, and then do it anyway.

Actually, having read this now, I find it quite obvious. You want to go from A to B. You are really motivated to do it. But you just can’t. You don’t have enough willpower to overcome … to overcome what?

The things that keep you at A are all the good things you have to leave behind, and the hardships you will have to endure on the way. You will not get to B until you accept this. If you really want to change something about yourself, but can’t overcome your inertia, then it’s because you haven’t really accepted the negative consequences of the change.

It’s obvious, but I find that I actually focus much more on the positive effects that I hope the change will have. This provides motivation, but doesn’t reduce inertia. It’s only one half of the equation.

The hard part: Accepting something negative is a decision. And making decisions — I’m talking about real decisions, not this small automatic kind of decision we make all the time — is Hard.

(This post is not related to current events; I just happened to come across Philip’s article today. The big change I want to see in my life is — still — to conquer procrastination.)

Posted in General | Comments Off

Trusted Computing

Posted on September 10, 2005 by Richard Cyganiak

An absolutely amazing movie.

(Via Martin Pittenauer)

Posted in General | Comments Off

My first Ruby program, and converting Evolution maildirs to mbox

Posted on September 10, 2005 by Richard Cyganiak

The Pragmatic Programmers famously recommended to learn one new programming language every year. Last year I learned AppleScript (page in German). My plan to learn Objectice C this spring didn’t work out for lack of time, but now I got closer to fulfilling my 2005 obligation: I wrote my first useful Ruby program.

Email archival woes: The background is this. When I left HP Labs two weeks ago, I wanted to take the email folders from my HP-provided Linux box with me. Stupidly, I just copied the Evolution folder, without thinking about how to import them into my email client, Apple’s Mail.app. The right thing would have been to export the folders in mbox format because Mail can import mbox, but cannot import the maildir format used by Evolution.

Luckily, maildir to mbox conversion didn’t look too hard. Both store the emails in plain text. Maildir stores each message in its own file (or multiple files for MIME-multipart messages). Mbox stores an entire folder in one file.

Shellscripting: So I set out to do the conversion. I started with a bit of cat and grep and sed on the terminal command line. That looked as if it could work. When the commands didn’t fit on one line any more, I moved them into a bash script. Converting simple plain text emails took about five lines. After half a screen, HTML emails and simple attachments worked. I believe this is actually the most complex shellscript I’ve ever written. Mendel Cooper’s Advanced Bash-Scripting Guide was an excellent companion.

But some emails, e.g. with other emails attached inside, turn out to be quite complex and require some form of recursion. This is where I decided to switch tools; when you start to need recursion, bash is quite definitely not the right tool for the job. Normally I would do something like this in PHP (I have a Perl aversion), but having been impressed by Ruby recently, I decided to give it a try.

First steps with Ruby: Starting out with Ruby is simple. I had already read some Martin Fowler posts and I went through Vincent Foley’s excellent Ruby on Rails tutorial a couple of days ago, so I already knew basic syntax (calling and defining methods, doing conditionals and loop) and some nice block-and-closure-fu. Together with the Pragmatic Programmer’s online version of Programming Ruby and the online class library reference and some 1337 G00g13 ski11z, that’s enough to get going.

65 lines later, all emails were translated. I’ve used file IO, directory listing, array filtering and regexes.

The script: Here it is. Run it inside the maildir folder, pipe stdout into an mbox file: ruby maildir_to_mbox.rb > output.mbox

maildir_to_mbox.rb

If you know Ruby, please have a look. Did I do anything stupid? Any useful idioms that could improve the code?

Ruby is nice: So far, I like the language. The syntax is pleasantly devoid of ASCII noise, the libraries make life easy, the array handling is superb. My main peeve so far: There’s too many ways to do the same thing (No, I don’t like Perl’s “There’s more than one way to do it” mantra). I’m looking forward to do more with Ruby.

So what should I learn next year?

Posted in General | Comments Off

RDF and web applications

Posted on September 8, 2005 by Richard Cyganiak

If you work with pie-in-the-sky technologies like this semantic web stuff you might have read about on this blog, then you need some kind of grounding to keep you connected to reality. You need some kind of compass that points you back to real-world problems. Is this technology useful? What’s missing to make it useful? Why doesn’t it catch on as fast as we like? Where should this technology move to solve people’s problems, to actually save time and money?

For me, this grounding is building small web applications, mostly with PHP. This is interesting and rewarding work. And it’s the lens through which I look at RDF and the semantic web. How could RDF help me here? How could it save me time and money?

One approach that could be interesting is to replace the “M” in the LAMP stack of technologies with an “R”. Instead of feeding the web app out of a MySQL database, it could get its data out of an RDF triplestore. Kendall Clark explores this idea in the context of Ruby on Rails and similar Model-View-Controller frameworks. Basically, it’s about using RDF as the model.

As Kendall points out, RDF is schemaless. You can use it in a /data first/ style, as opposed to MySQL’s /structure first/ style. I think this would be great for agile development and rapid prototyping.

We have lots of technologies in place that could play a role in this RDF web application stack:

Turtle as a quick-and-dirty way to create some data
SPARQL to query the datastore
Some XPath-like mini query language like FSL to pull data into HTML templates
Object-RDF mappers like Jastor, Kazuki and Sparta as glue between RDF and the business logic

I’ve no idea how all this fits together, but this is something to watch.

Posted in General, Semantic Web | Comments Off

Scoble interviews The Bill

Posted on September 8, 2005 by Richard Cyganiak

Scoble interviews Bill Gates (WMV, biiig file)

This is good stuff. He doesn’t actually say anything surprising, but watching this is fun. Bill is a nerd. Interesting body language. Too expensive clothes.

You don’t get to see stuff like this on TV.

This is also evidence of the strange transformation that has been taking place at Microsoft for some years. At which other company can an employee walk in to talk to the boss, just to publish the recording on the Internet? Without any noticeable PR spin?

For some reason I feel like I witnessed a historic moment here.

Posted in General | Comments Off

OMG!!!1 My server got hacked

Posted on September 6, 2005 by Richard Cyganiak

It’s 2:32 in the morning. I just spent two rather unpleasant hours cleaning up after I noticed that my webspace got broken into.

So what happened? Apparently, about two days ago, the attackers exploitet a vulnerability in PHP’s XML-RPC support to gain access. The culprit that left the door wide open was the WordPress installation of my German blog. I recently upgraded the software of the blog you’re currently reading to a version that happens to be immune against that attack, but didn’t upgrade the other one. Damn.

I’m hosting several domains and subdomains on this webspace. The attackers replaced all index.php and index.html files in the domain roots with ten-byte-files containing only the word “oldschool”. They also placed a file “lol.html” containing only the word “lol” in one of the subdomains. According to the timestamps on the server, there are no other changes.

This means the damage was easy to fix — I just had to restore a couple of files from backup. I also deleted xmlrpc.php from the old WordPress installation as a workaround for the vulnerability. I’ll upgrade it later.

I poked around the server logs to find out what was going on. Here’s one of the interesting parts of the log for cyganiak.de and its subdomains:

194.72.238.15 - - [03/Sep/2005:21:39:16 +0200] "HEAD / HTTP/1.1" 200 0 "http://www.netcraft.com/survey/" "Mozilla/4.0 (compatible; Netcraft Web Server Survey)"
201.4.232.241 - - [03/Sep/2005:21:39:19 +0200] "POST /blog//xmlrpc.php HTTP/1.1" 200 32 "-" "-"
201.4.232.241 - - [03/Sep/2005:21:39:22 +0200] "POST /blog//xmlrpc.php HTTP/1.1" 200 32 "-" "-"
201.4.232.241 - - [03/Sep/2005:21:39:28 +0200] "POST /blog//xmlrpc.php HTTP/1.1" 200 32 "-" "-"
201.4.232.241 - - [03/Sep/2005:21:39:34 +0200] "POST /blog//xmlrpc.php HTTP/1.1" 200 32 "-" "-"
201.4.232.241 - - [03/Sep/2005:21:39:36 +0200] "POST /blog//xmlrpc.php HTTP/1.1" 200 32 "-" "-"
213.219.122.11 - - [03/Sep/2005:21:41:42 +0200] "GET / HTTP/1.0" 200 10 "-" "Wget/1.9.1"
213.219.122.11 - - [03/Sep/2005:21:41:42 +0200] "GET / HTTP/1.0" 401 1695 "-" "Wget/1.9.1"
213.219.122.11 - - [03/Sep/2005:21:41:51 +0200] "GET / HTTP/1.0" 200 10 "-" "Wget/1.9.1"
194.72.238.15 - - [03/Sep/2005:21:42:54 +0200] "HEAD / HTTP/1.1" 200 0 "http://www.netcraft.com/survey/" "Mozilla/4.0 (compatible; Netcraft Web Server Survey)"

The first and the last hit is Netcraft, a familiar sight in logfiles all over the world. In between is the exploit: The POSTs to xmlrpc.php execute some evil code on my server. The GETs quite obviously verify the results: The attacker checks if the homepage is gone. Quite a number of similar sequences can be found throughout the logs.

This log trawling told me two things: First, that the attackers came through the old WordPress installation and not one of the other pieces of software I’m running (or through the webhoster’s system). Second, it gave me some of the IP addresses involved. Most of them resolve to brazilian dialup IPs. Zombies.

Except for one, the address that sent the GET requests above. It resolves to zone-h.org. This turns out to belong to a cracker group whose speciality is mass defacement of websites. They apparently have tools that do this kind of stuff automatically, many times every day. They even have an RSS feed of their successful exploits. They appear to be based in Estland.

So it’s just a bunch of script kiddies who take advantage of lazy webmasters in order to brag about it on IRC. I still want someone to kick them in the groin.

Posted in General | 1 Comment

Dave Beckett goes to work for Yahoo

Posted on September 6, 2005 by Richard Cyganiak

Dave Beckett is going to work for Yahoo.

This opportunity will allow me to apply my skills, experience and knowledge of RDF, semantic web and software development into Yahoo!’s systems and rich content. The media group covers Yahoo! News, Sports, Finance, Movies and Music but there are of course other Yahoo! groups with semi/structured content which I am likely to be working with and might be able to benefit from semantic web approaches.

Congratulations, Dave!

Posted in General, Semantic Web | Comments Off

New Orleans

Posted on September 6, 2005 by Richard Cyganiak

From an excellent piece in the Sunday Times bashing the Bush administration for their incompetent handling of the New Orleans crisis:

Ask yourself this: What if Al-Qaeda blew up the levees instead of the hurricane? Would the response have been any different?

That is a really, really good question.

Posted in General | Comments Off

Kowari 1.1 coming soon, Jena support removed

Posted on August 31, 2005 by Richard Cyganiak

Kowari 1.1 is coming soon. Congratulations to the Kowari team, they are through some difficult times and it’s good to see the project moving ahead.

I was slightly disappointed by this:

The removal of Jena support. Kowari v1.0 implemented a Jena storage backend and included an RDQL query language API. This has caused difficulties in maintenance and support as the two projects have gone in different directions. Kowari will remove (not deprecate) support for Jena as of the v1.1 release.

The Jena integration of Kowari was rather weird. They implemented the full API instead of just the internal Graph interface, which means they had to do a whole lot of extra work, and all they got from it was a bit of extra performance at a large cost of flexibility.

SPARQL is not yet supported as far as I can tell, but when it’s ready then that should be a good interface for most of the interaction between Kowari and applications using Jena.

SPARQL could cover queries, but what about updating the store from the application?

Posted in General, Semantic Web | 2 Comments

The secret to getting absolutely everything you want

Understand the downsides, and then do it anyway

Trusted Computing

My first Ruby program, and converting Evolution maildirs to mbox

RDF and web applications

Scoble interviews The Bill

OMG!!!1 My server got hacked

Dave Beckett goes to work for Yahoo

New Orleans

Kowari 1.1 coming soon, Jena support removed

About me

Links

Recent Posts

Archives