Loading... Cancel

lucene based semantic-repository 0.5.2: major performance improvement now 24,000 items imported in 19 minutes R

June 2nd, 2008

when we started using semantic repository, we had only one lucene index to make our content search able,
later we came up with another integration with one php based service aawaj.

on aawaj service they had  more than 150,000 items to index. we tried with our current release 0.5.1 to index all contents but we ended with extremely performance outage. later we released another version 0.5.2, where we added queued request handling and threw index optimization over an restful service uri - /rest/service/optimize/

here is the simple benchmark report -

version - 0.5.1 - first 100 items  ended in - 13.611 seconds.
version - 0.5.2 - first 100 items  ended in - 5.6152 seconds.

the change is really different and significant, later today we had anoter import on our repository, interestingly it took 1 hour to index 150,000 items. which was bit surprising since we were unable to do it with 0.5.1

actually we added single thread executor which keeps everything in queue and execute one by one. so we could remove  synchronized method.

here is an example code -

private final Executor mIndexTaskExecutor =
Executors.newSingleThreadExecutor();
public void addDocument(final Document pDocument) {
mIndexTaskExecutor.execute(
new Runnable() {
public void run() {
getLuceneIndexTemplate().addDocument(pDocument);
}
}
);
}
semantic repository service is intended for indexing content from different sources and maintain multi indexes for different types of content and perform different types of search. yet another solr type indexing service on top of lucene but it will gradually support content versioning and more semantic search result.

Total 0 response found

Close