Lucene Search from Blosxom

Overview:

Lucene's in Java, blosxom's in Perl.  Most of the work is getting the two to talk.  I'll describe the process bottom up, then show the code top down.

First, follow the Lucene instructions to index your blosxom *.txt files.

LuceneSearch.java is a simplified SearchFiles program which accepts the index name and a url encoded query from the argument list and simply outputs the names of the matching files to standard out.  The reason for the url encoding is to avoid the mess of concatenating arguments and unmatched quotes and security issues.

Lucene.pm is a perl module that simply calls the Java program, captures the output into an array (after stripping off paths and linefeeds) and otherwise mimics the object interface of DirHandle.  The stripping off of paths is probably a bit of overkill at the moment, and may ultimately need to be relaxed.

blosxom.cgi is modified to simply use Lucene and to substitute the appropriate instance of this object if the "q" parameter was passed on the URL.  Everything else is the same, so blosxom will do its normal job of sorting and selecting by date, and formatting as required.

All that is left is to add a form to the appropriate place in the html...

Add to foot.html

<form id="searchform" method="get" action=Search>
<p id="searchlabel"><label for="q" accesskey="4"><span class="heading">Search this site:</span></label></p>
<p id="searchinput"><input type="text" id="q" name="q" size="18" maxlength="255" value=" " /></p>
<p id="searchsubmit">
<input type="submit" value=Search />
<a href="http://jakarta.apache.org/lucene"><img src="/images/lucene_green_100.gif" alt="Lucene" border="0" /></a>
</p>
</form>

Add to blosxom.cgi (first line with the other use's, the second before the foreach)

use Lucene;

param('q') and $dh = new Lucene(param('q'));

Create a Lucene.pm in the same directory as blosxom.cgi:

package Lucene;

use File::stat;
use URI::Escape;
use POSIX qw(strftime);
use Env qw(@PATH @CLASSPATH);

# --- Configurable variables -----

# Where's Java?
my $JAVA_HOME = '/home/rubys/jdk1.3.1_04';

# Where's lucene?
my $lucene = '/home/rubys/lucene/lib/*.jar';

# What's my index?
my $index = '/home/rubys/lucene/index';

# --------------------------------

unshift @PATH, "$JAVA_HOME/bin";
push @CLASSPATH, "$JAVA_HOME/lib/tools.jar";
push @CLASSPATH, "$JAVA_HOME/jre/lib/rt.jar";
push @CLASSPATH, glob($lucene);

sub new {
  shift;
  $arg = uri_escape(shift);

  foreach (`$JAVA_HOME/bin/java -cp $ENV{CLASSPATH} LuceneSearch $index $arg`) {
    chomp;
    s !.*/!!;
    push @matches, $_;
  }

  $self = @matches;
  bless $self;
  return $self;
}

sub read {
  my $self = shift;
  @$self;
}

1;

Compile and place into a jar LuceneSearch.java:

import java.net.URLDecoder;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.search.Searcher;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.Hits;
import org.apache.lucene.queryParser.QueryParser;

class LuceneSearch {
  public static void main(String[] args) {
    try {

      Searcher searcher = new IndexSearcher(args[0]);
      Analyzer analyzer = new StandardAnalyzer();

      String line = URLDecoder.decode(args[1]);

      Query query = QueryParser.parse(line, "contents", analyzer);

      Hits hits = searcher.search(query);

      for (int i = 0; i < hits.length(); i++) {
        System.out.println(hits.doc(i).get("url"));
      }

      searcher.close();

    } catch (Exception e) {
      e.printStackTrace();
      System.exit(9);
    }
  }
}

You're done!

Essays

Noun vs Verb

Topology

Evolution of the Weblog APIs

Cohesion

SOAP by Example

A Gentle Introduction to Namespaces

Really Simple Syndication

Expect More

REST + SOAP

Beyond Backlinks

Google's Genius

Neuro Transmitters

Headers and Hrefs

A Gentle Introduction to SOAP

Coping with Change

Manufactured Serendipity

Dealing with Diversity

A Busy Developers Guide to WSDL 1.1

Axis/Radio Interop, Actual and Potential

To Infinity and Beyond: the Quest for SOAP Interoperability

What Object Does SOAP Access?

Favorites

In Praise of Evolvable Systems

Metacrap

The Law of Leaky Abstractions

The Eight Fallacies of Distributed Computing

Permanet, Nearlynet, and Wireless Data

Warnock's Dilemma

Sunir's corollaries

Search

Valid XHTML 1.0!