Re: Researching the rp-ml archives

From: Seppo J Niemi (zaphod@bart.lpt.fi)
Date: Fri Aug 30 1996 - 10:26:52 EEST


Ian Gibson writes:

> When I looked at the web page after the search, I noted that I had switched
> to the web page of rp-ml (ltk.hut.fi). What does this mean? Does it mean
> that hotbot contains a mirror of the rp-ml archives or that it searches the
> archives directly? If it is the latter, how does it do it so quickly?

What the HotBot search engine does, is that it traverses the WWW document
tree and maintains a database of the contents of the documents. When
queried, it builds a list of _links_ to the documents that contain
text which matches the search pattern. So, when you followed one of
the links given to your query for "rp-ml MJM", you arrived at the
actual archives.

So HotBot does not contain a mirror of the archives. And because there
are numerous other WWW servers in the world with millions of pages, it
is not possible for HotBot to maintain an accurate, up to date
database of every page in the world. Therefore the database does not
contain references to the most recent articles in the rp-ml archive;
the HotBot FAQ says that the database is refreshed roughly once a
week, but it failed to find some documents which were as old as 20
days - so the true refresh interval is probably about one month at the
moment. But it is nevertheless a remarkable search engine, the best
there is.

> Also, who is responsible for the fact that the main body info is also
> searched? The guys at hotbot or the guys in Finland? I suspect I know the
> answer already but just so I know who to thank.

You should thank the guys (and girls, why not) at HotBot. They are
doing a fine job. It's http://www.hotbot.com/ if you haven't been
there before.

I have been thinking of implementing a very crude search method in the
rp-ml archives. It would be nothing more but a simple text string
match (using fgrep) in the html files, and thus not very sophisticated
and definitely not too fast. But like so many other things, it is
buried in my TO-DO -list, somewhere between 250 and 300 so don't
expect it in the near future :-)

//zaphod



This archive was generated by hypermail 2.1.2 : Tue Jun 05 2001 - 22:37:33 EEST