On 2005-06-30 Googlebot visited node 1, the leftmost node. It did not crawl the path from the root to this node, so how did it find the page? Did it guess the URL or did it follow some external link? A few hours later, Googlebot crawled node 2, which is linked as a parent node by node 1. These two nodes are displayed as a tiny dot in the animation on 2005-06-30, floating above the left branch. Then, a week later, on 2005-07-06 (two days after the attempt to find rightmost node), between 06:39:39 and 06:39:59 Googlebot finds the path to these disconnected nodes by visiting the 24 missing nodes in 20 seconds. It started at the root and found it’s way up to node 2, without selecting a right branch. In the large version of the Googlebot tree, this path is clearly visible. The nodes halfway the path were not requested for a second time and are represented by thin short line segments, hence the steep curve.
An experiment to study the major search bots, these mythical creatures of the Internet, by constructing a virtual labyrinth for them to explore, and recording their traces … The Internet is truly a strange place, with its bots and spiders and crawlers and zombies, its darknets and tunnels and backbones and honeypots. It is long past the point where any single person can grasp what’s going on out there.
William Gibson’s cyberspace can’t be far off.
(via Tim Bray)