Archive of articles classified as' "Programming"

Back home

Merging multiple Lucene indexes

31/03/2009

Apache Lucene LogoThis is the code that I  use to merge multiple Lucene indexes into one. There are many reasons to merge multiple indexes into one like:

    • Speed
    • Ease of management
    • Space – the size of the merged index is less than the sum of non-merged indexes

Here is the code of an Index Merger:

package ucy.cs.hpcl.minerSoft.indexmanipulation;

/*This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see <http://www.gnu.org/licenses/>.

Author: Asterios Katsifodimos (http://www.asteriosk.gr)
*/
import java.io.File;
import java.io.IOException;
import java.util.Date;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;

public class IndexMerger {

	
	/** Index all text files under a directory. */
	public static void main(String[] args) {

		if(args.length != 2){
			System.out.println("Usage: java -jar IndexMerger.jar " +
					           "existing_indexes_dir merged_index_dir");
			System.out.println(" existing_indexes_dir: A directory where the " +
					             "indexes that have to merged exist");
			System.out.println("   e.g. indexes/");
			System.out.println("   e.g.         index1");
			System.out.println("   e.g.         index2");
			System.out.println("   e.g.         index3");
			System.out.println(" merged_index_dir: A directory where the merged " +
					                               "index will be stored");
			System.out.println("   e.g. merged_indexes");
			System.exit(1);
		}
		
		File INDEXES_DIR  = new File(args[0]);
		File INDEX_DIR    = new File(args[1]);

		INDEX_DIR.mkdir();
		
		Date start = new Date();

		try {
			IndexWriter writer = new IndexWriter(INDEX_DIR, 
												new StandardAnalyzer(), 
												true);
			writer.setMergeFactor(1000);
			writer.setRAMBufferSizeMB(50);
			
			Directory indexes[] = new Directory[INDEXES_DIR.list().length];

			for (int i = 0; i < INDEXES_DIR.list().length; i++) {
				System.out.println("Adding: " + INDEXES_DIR.list()[i]);
				indexes[i] = FSDirectory.getDirectory(INDEXES_DIR.getAbsolutePath() 
													+ "/" + INDEXES_DIR.list()[i]);
			}

			System.out.print("Merging added indexes...");
			writer.addIndexes(indexes);
			System.out.println("done");

			System.out.print("Optimizing index...");
			writer.optimize();
			writer.close();
			System.out.println("done");

			Date end = new Date();
			System.out.println("It took: "+((end.getTime() - start.getTime()) / 1000) 
											+ "\"");

		} catch (IOException e) {
			e.printStackTrace();
		}
	}
}

You can play with these values to gain some more performance. My settings are pretty generic.

writer.setMergeFactor(1000);
writer.setRAMBufferSizeMB(50);

Downloads

Here you can download an Index Merger that takes an argument with the folder containing the indexes to be merged and an output directory where it is going to store the merged indexes.

    • IndexMerger.zip, Contains: a standalone jar file (with Lucene bundled), and the source code of the IndexMerger.

Leave a comment if you find something bad on this code!

24 Comments

Run your own OpenID server – Installing Prairie on lighttpd

22/02/2009

I am sure that you have heard about OpenID. OpenID is an open, decentralized, free framework for user-centric digital identity. That means that you only need only one username in order to be able to login into many sites that support OpenID. Yahoo!, Google, Microsoft and many many others have expressed support and are already providers of this wonderful idea that is spreading every day.

With OpenID every user has one universal username that uses in every OpenID-enabled site. For example, my OpenID is “id.asteriosk.gr”. It is unique since I own the domain asteriosk.gr and I use it to log into this blog, and in many other sites that support OpenID login.

In order to be able to use openID you need an OpenID provider. That could be AOL, Blogger, Flickr, WordPress and many many others. For example if aol is your provider, you will can use username like this: openid.aol.com/screenname to login to sites. If its WordPress you have to use usernames like this: username.wordpress.com.

Having a provider that you trust is a very nice thing. However, when it comes to a universal username that you will use for every login on the internet, you might want to have your own personalized OpenID that is using your own domain name.

Having your own OpenID means that you are the provider of yourself. If you are the provider of yourself, you will have to use your own server and infrastructure to provide yourself OpenID services. Providing OpenID services to yourself, means that you have to be able to keep your site secure. Installing an OpenID server is quite easy but keeping it secure, is kind of tricky. If you want to run your own OpenID server, you can use one of the many OpenID servers out there.

In my case, I chose the Prairie server. It supports multiple users and it only needs PHP and MySQL. However, there were problems installing it. Prairie needs Apache to run. That’s because it needs .htaccess files in order to make those URL rewrites easier for Prairie’s developers. My server runs on Lighttpd, so the first problem that came up when trying to install Prairie was -as usual- lighttpd’s mod_rewrite module.

I followed Prairie’s instructions. The installation is pretty easy, you just need a MySQL username and password and a database where Prairie will save its data. After that you run the installed script and everything works as expected! The only difficult thing I had to overcome was to convert the .htaccess file so that lighttpd can understand what to do with those rewrites.

In my existing PHP-enabled server, I created a subdomain named id.asteriosk.gr and I put these rewrite rules to get it working:

 <br />$HTTP["host"] == "id.asteriosk.gr" { <br /> server.document-root = "/opt/apps/prairie/" <br /> url.rewrite = ( "(.<em>.php|theme/|template/|install/).</em>" => "$0", "^/(.*)$" => "/index.php?$1" ) <br />} <br />

Another thing that you will have to have in mind is that Prairie, needs a special PHP library to run. The library is called bcmath and you will have to install it by hand or using your package manager. In my case (CentOS 5) I just run:

 yum -y install php-bcmath 

and restarted lighttpd. I also had to kill the existing php-fcgi’s that were already running.


163 Comments

jQuery: Javascript Library conflicts

15/02/2009

Sometimes, jQuery and other libraries must be used at the same time during a webpage development. The problem starts when both libraries(like prototype) use the same annotation e.g. ‘$(document).ready’. One of the two libraries is expected not to work at the same time. It is very easy to get over this. You just have to put something like this into your code:

<script src="prototype.js"></script>
<script src="jquery.js"></script>
<script type="text/javascript"><!--
jQuery.noConflict();
// Use jQuery via jQuery(...)
jQuery(document).ready(function(){
jQuery("div").hide();
});
// Use Prototype with $(...), etc.
$('someid').hide();
// --></script>

Instead of:


<script src="prototype.js"></script>
<script src="jquery.js"></script>
<script type="text/javascript"><!--
$(document).ready(function(){
$("div").hide();
});
// Use Prototype with $(...), etc.
$('someid').hide();
// --></script>

Source: Using jQuery with other libraries

4 Comments

Adding -pthread to eclipse, for using posix threads

15/02/2009

Problem

Cannot compile c/c++ project with -pthread. Eclipse is complaining about things.

Solution

Eclipse is not configured to put the -pthread argument in the gcc compilation. To solve this, go to the Menu:

Project -> Properties

From the bar on the left:

c/c++ build -> GCC C Compiler -> Miscellaneous

Add the “-pthread” argument into the beginning of the “Other Flags” Also go to:

c/c++ build -> Settings -> GCC C Linker -> Libraries

And include the “pthread”library into the other libraries. Click Apply and rebuild the project. Pthreads must work now.

177 Comments