It’s just data
I have a feeling that's actually what BlogStudio does - www.blogstudio.com - which does presnet problems when it comes to accessing archives of discussions they are hosting.
Posted by Simon Phipps atAnd as you can see from the URL Sam posted, its actually working! The search engine is indeed doing exactly what it should. I'm adding each blog document (blosxom-style text files) with the category as the path category that blosxom uses, and also the date format "path" (/2002/12/15) and so it instantly finds documents based on these paths - no filesystem access whatsoever (except Lucene's API hitting its index files). Super fast.
Querying can be by full document, title, category, date, or permalink. And the query works from the "path" (category or date) downward making it quite powerful. Sam already has this going on his site to some extent.... my extension is adding the title field and storing the blogs within Lucene. This is all automated with Ant using my <index> task.
Note: the app is currently under development but its working nicely so far.
Can it do date range searches? The problem with BlogStudio is that they don't build archives but only support searches, so once items leave your top page they can only be retrieved by an explicit search. If they had a 'display all entries by range' option there would be no problem.
Looks good, BTW.
Yes, it is indexing the files by their last modified date, and Lucene supports a RangeQuery. The tricky part is exposing that to the UI. Lucene's QueryParser does a great job, but its got its limitations. I'm working on understanding how to phrase queries with it, but doing it through the API is trivial though.
Posted by Erik Hatcher at