Archive of articles classified as' "Uncategorized"

Back home

Adjust contrast and brightness of a PDF

1/05/2012

In case you want to adjust brightness and contrast of a PDF file on the Mac (or Linux) you can use ImageMagick. Here is what I used:

convert -brightness-contrast 10,50  -compress lzw  \
        -density 500 -colorspace Gray input.pdf output.pdf

the -brightness-contrast 10,50 controls the brightness and contrast (they can both take values from -100 to +100), Density controls the resolution and -colorspace controls the color output. In this case, its Gray. lzw is a lossless compression method.

9 Comments

Dropbox won’t start due to permissions

28/08/2011

The other day I tried to restart my (console only) dropbox on my Ubuntu server. Suddenly -and without having changed anything in my installation- Dropbox refused to start reporting an issue with permissions.

In fact I got this message:

Couldn’t start Dropbox
This is usually because of a permissions error. Errors can also be caused by your home folder being stored on a network share.

I tried lots of proposed solutions following the Dropbox forums (fixing permissions, deleting the .dropbox folder, reinstalling dropbox, installing a newer version) I had no luck.

What worked finally was:

export LANGUAGE=en
dropbox stop
dropbox start
129 Comments

Passing Kerberos TGT (ticket-granting ticket) to remote hosts with ssh

18/11/2009

Kerberos uses tickets to grant access to resources on a Kerberos-enabled computer. If you want to login (via SSH) to a remote host and you don’t want to re-execute the kinit command after you login, you can just forward your ticket with your ssh client. Two steps are needed to do this – given that your Kerberos client is configured:

  1. Create a Kerberos forwardable ticket to your machine
  2. and, forward your ticket while logging in to the remote machine.

In order to create a forwardable ticket execute kinit with the “-f” argument. e.g.:

pythoagoras:~ asteriosk$ kinit -f
Please enter the password for username@domain.com:

In order to tell the ssh client to forward your ticket to the remote machine, you have to configure it accordingly. The easiest way to do it is to include two directives in your ssh client configuration file which is in .ssh/config (create one if there its not there).

chercheurs2-235:~ asteriosk$ more ~/.ssh/config
Host domain.com
        GSSAPIAuthentication yes
        GSSAPIDelegateCredentials yes

Of course, substitute domain.com and username accordingly to match your configuration. This works for both Linux and Mac OS X clients.

11 Comments

Google’s support for RDFa and Microformats

19/10/2009

Google has announced that their search engine is going to support enhanced searching in web pages, by using RDFa and Microformats embedded in XHTML. Google states that the extra (structured) data will be used in order to get results for Product Reviews (e.g. CNET Reviews), Products (e.g. Amazon product pages), People (e.g. LinkedIn profiles) and any other types of resources will be made public through the data-vocabulary.org. W3C is pretty happy about that.

The news are good for three reasons:

  1. Google supports an Open Standard (RDFa by W3C) and also an Open RDF Vocabulary
  2. Structured Data embedded in human readable web pages are going to start showing up. Content providers will start using RDFa or Microformats to get Google’s Rich Snippets in search results. Of course other Search Engines will follow :) Yahoo! already supports RDFa in SearchMonkey.
  3. One more big step towards the Semantic Web.

I suspect that data from RDFa or Microformats will be also used in some way by Google for Ranking search results and that a new SEO era is going to start. However it is too early to make predictions.

From my personal point of view, if RDFa is finally going to be widely adopted, it will be the first time that scientists will have a Web-scale distributed, structured data “playground” to do research on. Although I am not an expert in the field, I remember that Semantic Web and Large Graph processing had scalability problems (correct me if I’m wrong!). The Web Graph (that is as simple as “one page links another”) is going to be much more complex and semantically “meaningful”. I am very curious to see what comes next in this direction!

2 Comments

Merging multiple Lucene indexes

31/03/2009

Apache Lucene LogoThis is the code that I  use to merge multiple Lucene indexes into one. There are many reasons to merge multiple indexes into one like:

    • Speed
    • Ease of management
    • Space – the size of the merged index is less than the sum of non-merged indexes

Here is the code of an Index Merger:

package ucy.cs.hpcl.minerSoft.indexmanipulation;

/*This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see <http://www.gnu.org/licenses/>.

Author: Asterios Katsifodimos (http://www.asteriosk.gr)
*/
import java.io.File;
import java.io.IOException;
import java.util.Date;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;

public class IndexMerger {

	
	/** Index all text files under a directory. */
	public static void main(String[] args) {

		if(args.length != 2){
			System.out.println("Usage: java -jar IndexMerger.jar " +
					           "existing_indexes_dir merged_index_dir");
			System.out.println(" existing_indexes_dir: A directory where the " +
					             "indexes that have to merged exist");
			System.out.println("   e.g. indexes/");
			System.out.println("   e.g.         index1");
			System.out.println("   e.g.         index2");
			System.out.println("   e.g.         index3");
			System.out.println(" merged_index_dir: A directory where the merged " +
					                               "index will be stored");
			System.out.println("   e.g. merged_indexes");
			System.exit(1);
		}
		
		File INDEXES_DIR  = new File(args[0]);
		File INDEX_DIR    = new File(args[1]);

		INDEX_DIR.mkdir();
		
		Date start = new Date();

		try {
			IndexWriter writer = new IndexWriter(INDEX_DIR, 
												new StandardAnalyzer(), 
												true);
			writer.setMergeFactor(1000);
			writer.setRAMBufferSizeMB(50);
			
			Directory indexes[] = new Directory[INDEXES_DIR.list().length];

			for (int i = 0; i < INDEXES_DIR.list().length; i++) {
				System.out.println("Adding: " + INDEXES_DIR.list()[i]);
				indexes[i] = FSDirectory.getDirectory(INDEXES_DIR.getAbsolutePath() 
													+ "/" + INDEXES_DIR.list()[i]);
			}

			System.out.print("Merging added indexes...");
			writer.addIndexes(indexes);
			System.out.println("done");

			System.out.print("Optimizing index...");
			writer.optimize();
			writer.close();
			System.out.println("done");

			Date end = new Date();
			System.out.println("It took: "+((end.getTime() - start.getTime()) / 1000) 
											+ "\"");

		} catch (IOException e) {
			e.printStackTrace();
		}
	}
}

You can play with these values to gain some more performance. My settings are pretty generic.

writer.setMergeFactor(1000);
writer.setRAMBufferSizeMB(50);

Downloads

Here you can download an Index Merger that takes an argument with the folder containing the indexes to be merged and an output directory where it is going to store the merged indexes.

    • IndexMerger.zip, Contains: a standalone jar file (with Lucene bundled), and the source code of the IndexMerger.

Leave a comment if you find something bad on this code!

24 Comments