Merging multiple Lucene indexes

31/03/2009

Apache Lucene LogoThis is the code that I  use to merge multiple Lucene indexes into one. There are many reasons to merge multiple indexes into one like:

    • Speed
    • Ease of management
    • Space – the size of the merged index is less than the sum of non-merged indexes

Here is the code of an Index Merger:

package ucy.cs.hpcl.minerSoft.indexmanipulation;

/*This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see <http://www.gnu.org/licenses/>.

Author: Asterios Katsifodimos (http://www.asteriosk.gr)
*/
import java.io.File;
import java.io.IOException;
import java.util.Date;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;

public class IndexMerger {

	
	/** Index all text files under a directory. */
	public static void main(String[] args) {

		if(args.length != 2){
			System.out.println("Usage: java -jar IndexMerger.jar " +
					           "existing_indexes_dir merged_index_dir");
			System.out.println(" existing_indexes_dir: A directory where the " +
					             "indexes that have to merged exist");
			System.out.println("   e.g. indexes/");
			System.out.println("   e.g.         index1");
			System.out.println("   e.g.         index2");
			System.out.println("   e.g.         index3");
			System.out.println(" merged_index_dir: A directory where the merged " +
					                               "index will be stored");
			System.out.println("   e.g. merged_indexes");
			System.exit(1);
		}
		
		File INDEXES_DIR  = new File(args[0]);
		File INDEX_DIR    = new File(args[1]);

		INDEX_DIR.mkdir();
		
		Date start = new Date();

		try {
			IndexWriter writer = new IndexWriter(INDEX_DIR, 
												new StandardAnalyzer(), 
												true);
			writer.setMergeFactor(1000);
			writer.setRAMBufferSizeMB(50);
			
			Directory indexes[] = new Directory[INDEXES_DIR.list().length];

			for (int i = 0; i < INDEXES_DIR.list().length; i++) {
				System.out.println("Adding: " + INDEXES_DIR.list()[i]);
				indexes[i] = FSDirectory.getDirectory(INDEXES_DIR.getAbsolutePath() 
													+ "/" + INDEXES_DIR.list()[i]);
			}

			System.out.print("Merging added indexes...");
			writer.addIndexes(indexes);
			System.out.println("done");

			System.out.print("Optimizing index...");
			writer.optimize();
			writer.close();
			System.out.println("done");

			Date end = new Date();
			System.out.println("It took: "+((end.getTime() - start.getTime()) / 1000) 
											+ "\"");

		} catch (IOException e) {
			e.printStackTrace();
		}
	}
}

You can play with these values to gain some more performance. My settings are pretty generic.

writer.setMergeFactor(1000);
writer.setRAMBufferSizeMB(50);

Downloads

Here you can download an Index Merger that takes an argument with the folder containing the indexes to be merged and an output directory where it is going to store the merged indexes.

    • IndexMerger.zip, Contains: a standalone jar file (with Lucene bundled), and the source code of the IndexMerger.

Leave a comment if you find something bad on this code!

There are 23 comments in this article:

  1. 16/05/2009miltos rulz say:

    wow…
    what an interesting job gioufte…
    epeidi den mporo na se bro oute sto skype oute sto fb
    eipa
    o m******s mallon tha parakolouthei to site toy opote as tou grapso kamia malakia na perasei i ora….
    kai kapoies erotiseis ama kaneis tin lucene merge me enan aggourofa(i to gior to lefa) ti tha ginei?
    arage o index writer me ton index merger einai opos o giors o pantalis me ton paulo?
    kai alli mia erotisi, an i taxitita den itan to thema tote anti gia asxoleisai me lucene kai mpompolia kipriaka mipos tha mporouses bre arxidi na mazepseis lefta na rtheis pio noris stin ellada to kalokeri kai na me ta doseis gia na paro mixani?
    aaaaaa?

    p.s. elpizo na to diabaseis sti douleia kai na se kaigetai o kwlos alla na kseratheis sta gelia monaxos kai na gineis rezili sto ergasthrio(pou mallon oute tha se katalabei giati tha se exei sinithisei na xaskogelas monaxos…moutaf)

  2. 31/12/2009tristartom say:

    hi, which version of lucene did you build w/? it seems not compatible w/ the latest version 3.0.0.

  3. 31/12/2009Asterios Katsifodimos say:

    Hi, I think I used Lucene 2.9.1 and I suppose that its not compatible with Lucene 3.0. Try to see which part of the code is causing the problem and if you find it, please let me know!

  4. 23/02/2010Javier Teso say:

    Yes, a few things have changed in lucene 3.0. You will have to replace the following lines:

    IndexWriter writer = new IndexWriter( INDEX_DIR,
    new StandardAnalyzer(),
    true,
    IndexWriter.MaxFieldLength.UNLIMITED);

    indexes[i] = FSDirectory.open(new File(INDEXES_DIR.getAbsolutePath()+ “/” + INDEXES_DIR.list()[i]));

    writer.addIndexesNoOptimize(indexes);

    I think that is all, it works perfect, thanks for sharing your code!!

  5. 23/02/2010Javier Teso say:

    Sorry the number of lines were missing…

    LINE 55
    IndexWriter writer = new IndexWriter( INDEX_DIR,
    new StandardAnalyzer(),
    true,
    IndexWriter.MaxFieldLength.UNLIMITED);

    LINE 65
    indexes[i] = FSDirectory.open(new File(INDEXES_DIR.getAbsolutePath()+ “/” + INDEXES_DIR.list()[i]));

    LINE 70
    writer.addIndexesNoOptimize(indexes);

  6. 23/02/2010Asterios Katsifodimos say:

    Cool, thanks a lot Javier! I will try to run it myself and post the updated code!

  7. 25/05/2010tim say:

    Hi there, many thanks. This helped very well! IndexModifier is deprecated and lacked performance, so this is a very good alternative to work with…

  8. 1/06/2010JOAN RICE say:

    I’ve been looking around blog.asteriosk.gr and actually am impressed by the exceptional content material here. I work the nightshift at my job and it is boring. I’ve been coming right here for the past couple nights and reading. I simply needed to let you know that I have been enjoying what I’ve seen and I look ahead to reading more.

  9. 19/07/2010kaka say:

    Hi,
    Will this work for solr indexes?? Since they are compatible it has to work, but in my case it is giving corrupt index exception..

  10. 11/05/2011Govind say:

    Hi
    Asterios
    I am working on distributed indexing
    It really worked
    It helped me alot
    Really I am very thankful.

  11. 25/10/2011JTheRocker say:

    Hi,

    This showing as depricated. Please update it if possible.

    IndexWriter writer = new IndexWriter( INDEX_DIR,
    new StandardAnalyzer(),
    true,
    IndexWriter.MaxFieldLength.UNLIMITED);

  12. 26/12/2011Missy say:

    Kindly can you tell me if it is possible to merge indexes of documents other then the .txt format i.e word,pdf e.t.c It will be a big favor on your part.

  13. 29/02/2012Home Heating Systems say:

    I cant find a link where i can subscribe to this blog, webmaster how can i follow your blog?

    Teddy Tschicke

  14. 7/09/2012Ganesh yogeesh say:

    i used ur code in my netbean 7.1 i have an error in ur code can pls help me how to solve that problem the error is below

    new StandardAnalyzer(),true);

  15. 7/09/2012Ganesh yogeesh say:

    i used ur code in my netbean 7.1 i got an error i cant sove that error can u please help me how to sove that problem i given below that error and problem

    method getDirectory in class org.apache.lucene.store.FSDirectory cannot be applied to given types;
    required: no arguments
    found: java.lang.String
    reason: actual and formal argument lists differ in length

    new StandardAnalyzer(),true); line 59

    indexes[i] = FSDirectory.getDirectory(INDEXES_DIR.getAbsolutePath()+ “/” + INDEXES_DIR.list()[i]); line 67

  16. 28/03/2013ανδρικα ρολογια μεγαλα say:

    Greetings! Very helpful advice in this particular
    post! It is the little changes that make the greatest changes.
    Many thanks for sharing!

  17. 28/05/2014thrifty car hire aberdeen airport say:

    Greetings! Very helpful advice within this post!
    It is the little changes that will make the biggest changes.
    Thanks for sharing!

  18. 6/10/2014οδηγος διακοπων say:

    Καταπληκτικό άρθρο.Keep the good work

  19. 6/10/2014προσωπικη φροντιδα say:

    Καλησπέρα είναι η πρώτη φορά που επισκέπτομαι αυτό το
    blog και με βοήθησε πάρα πολύ στη
    εργασία μου. Μακάρι να μπορούσα να σας βοηθήσω όπως εσείς εμένα
    μόλις.

  20. 7/10/2014businessinsiders say:

    Καλησπέρα είναι η πρώτη φορά που
    επισκέπτομαι αυτό το blog και με βοήθησε πάρα πολύ στη
    έρευνα μου. Μακάρι να μπορούσα να σας βοηθήσω όπως εσείς εμένα μόλις.

  21. 8/10/2014doctor bros say:

    Μόλις προστέθηκε στα αγαπημένα μου!
    Μ’ αρέσειπολύ το blog. Συνέχισε
    την καλή δουλειά σου.

  22. 12/10/2014Seo Blog say:

    Μόλις προστέθηκε στα bookmark μου! Μ’ αρέσει
    πολύ το site. Συνέχισε την καλή δουλειά
    σου.

  23. 25/02/2015Hal Segal say:

    It amazes me at the true absence of any morals, ethics and basic humanity “you people” seem to never have acquired, or learned. I see all of these fictitious names, using software that none of you own. You are like abunchj of Nasty Vultures, doing everything possible to prevent the rightful heir/partner/spouse of the deceasedto claim and undwrztand the depth of twenty five years you have stolen. togerger. All for your small, greedy selves. It is gross and I certainly hope you’re a small,uneducated,simple minority of the people I share space with,e with, here on earth.
    I don’t give a shit about any excuses, or justifications you might believe you have: You are common criminals, thieves and intentionally changing historical data and records. SHAME ON YOU and SHAME ON YOUR PARENTS FOR RSISING SUCH FILTH.I do not know, perhaps you can have breakfaat with JESUS, so she can tell you how yourgoing strsight to hell.

    Again, I’m not interested in anything you might be,or ha e to say,just stop being do obviously disgusting.

Write a comment: