Merging multiple Lucene indexes

31/03/2009

Apache Lucene LogoThis is the code that I  use to merge multiple Lucene indexes into one. There are many reasons to merge multiple indexes into one like:

    • Speed
    • Ease of management
    • Space – the size of the merged index is less than the sum of non-merged indexes

Here is the code of an Index Merger:

package ucy.cs.hpcl.minerSoft.indexmanipulation;

/*This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see <http://www.gnu.org/licenses/>.

Author: Asterios Katsifodimos (http://www.asteriosk.gr)
*/
import java.io.File;
import java.io.IOException;
import java.util.Date;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;

public class IndexMerger {

	/** Index all text files under a directory. */
	public static void main(String[] args) {

		if(args.length != 2){
			System.out.println("Usage: java -jar IndexMerger.jar " +
					           "existing_indexes_dir merged_index_dir");
			System.out.println(" existing_indexes_dir: A directory where the " +
					             "indexes that have to merged exist");
			System.out.println("   e.g. indexes/");
			System.out.println("   e.g.         index1");
			System.out.println("   e.g.         index2");
			System.out.println("   e.g.         index3");
			System.out.println(" merged_index_dir: A directory where the merged " +
					                               "index will be stored");
			System.out.println("   e.g. merged_indexes");
			System.exit(1);
		}

		File INDEXES_DIR  = new File(args[0]);
		File INDEX_DIR    = new File(args[1]);

		INDEX_DIR.mkdir();

		Date start = new Date();

		try {
			IndexWriter writer = new IndexWriter(INDEX_DIR,
												new StandardAnalyzer(),
												true);
			writer.setMergeFactor(1000);
			writer.setRAMBufferSizeMB(50);

			Directory indexes[] = new Directory[INDEXES_DIR.list().length];

			for (int i = 0; i < INDEXES_DIR.list().length; i++) {
				System.out.println("Adding: " + INDEXES_DIR.list()[i]);
				indexes[i] = FSDirectory.getDirectory(INDEXES_DIR.getAbsolutePath()
													+ "/" + INDEXES_DIR.list()[i]);
			}

			System.out.print("Merging added indexes...");
			writer.addIndexes(indexes);
			System.out.println("done");

			System.out.print("Optimizing index...");
			writer.optimize();
			writer.close();
			System.out.println("done");

			Date end = new Date();
			System.out.println("It took: "+((end.getTime() - start.getTime()) / 1000)
											+ "\"");

		} catch (IOException e) {
			e.printStackTrace();
		}
	}
}

You can play with these values to gain some more performance. My settings are pretty generic.

writer.setMergeFactor(1000);
writer.setRAMBufferSizeMB(50);

Downloads

Here you can download an Index Merger that takes an argument with the folder containing the indexes to be merged and an output directory where it is going to store the merged indexes.

    • IndexMerger.zip, Contains: a standalone jar file (with Lucene bundled), and the source code of the IndexMerger.

Leave a comment if you find something bad on this code!

There are 9 comments in this article:

  1. 16/05/2009miltos rulz say:

    wow…
    what an interesting job gioufte…
    epeidi den mporo na se bro oute sto skype oute sto fb
    eipa
    o m******s mallon tha parakolouthei to site toy opote as tou grapso kamia malakia na perasei i ora….
    kai kapoies erotiseis ama kaneis tin lucene merge me enan aggourofa(i to gior to lefa) ti tha ginei?
    arage o index writer me ton index merger einai opos o giors o pantalis me ton paulo?
    kai alli mia erotisi, an i taxitita den itan to thema tote anti gia asxoleisai me lucene kai mpompolia kipriaka mipos tha mporouses bre arxidi na mazepseis lefta na rtheis pio noris stin ellada to kalokeri kai na me ta doseis gia na paro mixani?
    aaaaaa?

    p.s. elpizo na to diabaseis sti douleia kai na se kaigetai o kwlos alla na kseratheis sta gelia monaxos kai na gineis rezili sto ergasthrio(pou mallon oute tha se katalabei giati tha se exei sinithisei na xaskogelas monaxos…moutaf)

  2. 31/12/2009tristartom say:

    hi, which version of lucene did you build w/? it seems not compatible w/ the latest version 3.0.0.

  3. 31/12/2009Asterios Katsifodimos say:

    Hi, I think I used Lucene 2.9.1 and I suppose that its not compatible with Lucene 3.0. Try to see which part of the code is causing the problem and if you find it, please let me know!

  4. 23/02/2010Javier Teso say:

    Yes, a few things have changed in lucene 3.0. You will have to replace the following lines:

    IndexWriter writer = new IndexWriter( INDEX_DIR,
    new StandardAnalyzer(),
    true,
    IndexWriter.MaxFieldLength.UNLIMITED);

    indexes[i] = FSDirectory.open(new File(INDEXES_DIR.getAbsolutePath()+ “/” + INDEXES_DIR.list()[i]));

    writer.addIndexesNoOptimize(indexes);

    I think that is all, it works perfect, thanks for sharing your code!!

  5. 23/02/2010Javier Teso say:

    Sorry the number of lines were missing…

    LINE 55
    IndexWriter writer = new IndexWriter( INDEX_DIR,
    new StandardAnalyzer(),
    true,
    IndexWriter.MaxFieldLength.UNLIMITED);

    LINE 65
    indexes[i] = FSDirectory.open(new File(INDEXES_DIR.getAbsolutePath()+ “/” + INDEXES_DIR.list()[i]));

    LINE 70
    writer.addIndexesNoOptimize(indexes);

  6. 23/02/2010Asterios Katsifodimos say:

    Cool, thanks a lot Javier! I will try to run it myself and post the updated code!

  7. 25/05/2010tim say:

    Hi there, many thanks. This helped very well! IndexModifier is deprecated and lacked performance, so this is a very good alternative to work with…

  8. 1/06/2010JOAN RICE say:

    I’ve been looking around blog.asteriosk.gr and actually am impressed by the exceptional content material here. I work the nightshift at my job and it is boring. I’ve been coming right here for the past couple nights and reading. I simply needed to let you know that I have been enjoying what I’ve seen and I look ahead to reading more.

  9. 19/07/2010kaka say:

    Hi,
    Will this work for solr indexes?? Since they are compatible it has to work, but in my case it is giving corrupt index exception..

Write a comment: