Archive for September, 2004

Popups!?!

Weird. I’m listening to Warren talk about colors. One of the fake sites he was showing was maternityfashions.com. So I just went there to see what was there. It was just a domain for sale. However the weird thing was I got popups from that site when I closed it, while browsing with Firefox 1.0PR on Linux. I believe those were the first and only popups I’ve ever had while browsing with Firefox.

Comments

Firefox Setup

After updating to Firefox 1.0PR, I re-tooled my Firefox startup script. The script accepts a URL and if there is no Firefox process running, it will start one and go to the URL that was the input. If there already is a Firefox process running it will open the URL in a new tab in the already running Firefox window. Previously, I used it every time I launched Firefox. This is not necessary anymore, now it is only used when I click on links from other applications. This is probably only useful for those using Linux.

I also had to change my userChrome.css file so that I could change the width of my search-bar. Unfortunately, the directions which I had previously used were old and didn’t quite work.
Read the rest of this entry »

Comments (2)

Nutch Shorterm Goals

  1. Ability to use regular expressions for URL substitutions.
  2. Allow users to to search using url:Store/View/Product/1001
  3. Faster crawling of websites that look like one (1) IP address.
  4. Some sort of templating engine for creating search results pages. Maybe use Velocity?

Comments (2)

Nutch patch #1

At work I was told to investigate other options for a search engine that would search just the sites that we host. While I was doing that I came across Nutch. It looked pretty sweet but not quite something that would fit our current needs. We needed a few more features. Currently at work we’re looking at a Google Search Appliance. It costs a pretty penny, but would be nice because hopefully that would be something we could just “set it and forget it.”

Lately in my spare time, I’ve started trying to add the features to Nutch that would allow us to use it. It’s fun. I recently submitted my first patch to the Nutch developers list. Hopefully I did everything well enough to get it commited to CVS. This patch allows users to specify Perl 5 regular expressions, which will get applied to all URLs that Nutch encounters. It’s useful for stuff like stripping out session IDs in URLs.

I’ve got a few more features that need to be added. I found another drawback to the way the crawler for Nutch was written. You can specify any number of threads to be running at the same time. However, currently it won’t allow two different threads to download from the same IP simultaneously. This is not good considering all of our websites look to the crawler as just 1 IP. I’ll probably have to make some changes there. Hopefully it’ll be relatively straightforward and easy.

Cool use of Nutch: Creative Commons Search (via: Doug)

Comments

Danger!

Frances. Looks like Miami has a chance of being spared again.

Update:Article about evacuations. Max Mayfield, the Hurricane Center Director who is quoted, used to be (still is?) my neighbor in Miami. Here’s another forecast picture. This thing looks pretty big, so even if it misses Miami with a direct hit, it still might do some damage. Yikes!

Comments