JBoss Class loading revisited

Just found another great resource on JBoss classloading. Since this is still a painful issue to figure out I’ve added a link.

Languages on the list

Just found an excellent walk-though of a wide variety of programming languages worth a look. Go read it here.

Python path

I was having issues with a python egg I had installed. To try to solve the issue I wanted to know where the actual module was installed. After looking in vain in site-packages I discovered this trick:

import <module-name>
print <module-name>.__file__

Pretty neat. Simply print the path to the file in question.

A story of Regular Expression

There is a good deal of über-geekiness about Regular Expressions. No arguments there. Like a statement in Perl any Regular Expression can be extremely esoteric and concise in itself – just learning to decipher a Regular Expression can take a lot of time.

But above and beyond the initial headache getting to know Regular Expressions there’s another mountain to climb. The thing is this: A Regular Expression doesn’t exist by itself. It is implemented in your choice of programming language. Regular Expression is implemented in Perl, Java, C#, Lisp, Python and so on. And not similarly implemented, mind you. No, it is implemented with minor differences and just enough syntactic modifications so that you can’t copy your hard-earned Regular Expression from, let’s say, from Perl to Java, and expect it to work.

Do I sound a bit disappointed? Well, I just made a two day trek into that implementation jungle, and I came back slightly changed to tell the tale. My trek started with a problem: A spellchecker in Actionscript was supposed to underline misspellings, but sometimes it failed completely. It all went fine with English – misspellings where underlined with red – but misspellings in Russian and Greek were mute. When I tracked down the issue in Actionscript one Regular Expression stood out:

var pattern:String = "\\b" + misspelling.word + "\\b";

After some reading of the ECMAscript spec I realized that things where not well in RegEx land. The word boundary token \b implemented in Actionscript will only support ASCII characters – per the ECMAscript specification. And I was having issues with Cyrillic and Greek. At first I couldn’t believe it. But the fact is that the word boundary we all think we know only works in ASCII. As an example try this in Javascript – another sibling of the ECMAscript family:

function main() {
	// In Russian
	var pattern = /\bБратья\b/g
	var str = "Братья Карамазовы.";
	doRegEx(str, pattern);
	pattern = /(?:\W|^)Братья(?=\W|$)/g
	doRegEx(str, pattern);
	// In English
	str = "The Brothers Karamazov.";
	pattern = /\bBrothers\b/g
	doRegEx(str, pattern);
	pattern = /(?:\W|^)Brothers(?=\W|$)/g
	doRegEx(str, pattern);
}
function doRegEx(str, pattern) {
	var regExp = new RegExp(pattern);
	var result = regExp.exec(str);
	document.write("Returned value: " + result + "<br>");
}

main();

You should get a null at the value first returned. So I had to find a replacement for \b. The solution that I painstakingly found was to rewrite \b first into (?<=\W)(?=\w)|(?<=\w)(?=\W) and then into /(?:\W|^)word(?=\W|$)/g – since ECMAscript doesn’t support look-behind either. As you can see it works quite well. Also when using non-ASCII.

That’ll teach me to take things for granted…

Aspell

A really interesting explanation of Aspell’s ingenious affix functionality. Go read it.

Stanford Engineering Everywhere

Just found this link to Stanford Engineering Everywhere. Some very advanced classes for people willing to learn.

Concurrency in Java

Imagine a limited task. Imagine a task it always takes exactly 6 hours to do. Imagine a task that you can always divide into smaller tasks of exactly equal size. Imagine that the task is embarrassingly parallel – when the task is divided into smaller tasks no communication of results between these tasks is needed and no dependencies between them exist. They are completely independent of each other. Finally imagine that there is no overhead running these tasks.

Where am I going with all this? Well I have a Gedankenexperiment. First think of this: If I divide said task in two equal sized tasks and put one worker on each task, then they are going to finish the perfect task in 3 hours – i.e. in half the time. That is provided that these workers work at exactly the same speed and therefore finish at the same time. If I split the task in 4 equal sized tasks and put one worker on each task, then they are going to finish in 1.5 hours. And if I split the task in 6 equal sized tasks and put one worker on each task, then they are going to finish in 1 hour. Notice anything peculiar?

I’ve replaced workers with threads and plotted a graph of the Gedankenexperiment.

As you can see the plot is really f(x) = 6/x. But that is not the peculiar part. What made me think is this: Seen from a cost-benefit perspective why would I hire thread 5 or even 6? Sure they will still get the task done sooner, but seen from the cost-benefit perspective I benefited most from number 2. Then number 3. And then possibly number 4. After that I get less and less from putting threads to work on the task. I find that puzzling. It looks at bit like Zeno’s paradoxes.

So I went to test this Gedankenexperiment with real threads in real code in the real world.

I tested on server 1, a Linux Intel Xeon 2.53 GHz with 4 cores. I tested on server 2, a Linux Intel Xeon 2.13 GHz with 4 cores. And I tested on server 3, a Windows laptop Intel Core i7 2.67 GHz with 2 cores. This is the results of my tests:

The same pattern is visible. I grant you that the plots are less significant in the real world. There is overhead and the threads probably does not work at the same speed, but the trend is still there. From the same cost-benefit perspective thread number 2 and 3 are still worth it – like before I still get significant improvements in time spent – but the rest of the threads are clearly not worth it. There is a steep slope from the second to the third thread, but from there it gets almost even.

I haven’t been able to find any literature on this. It’s unlike Amdahl’s law. And it’s still not an argument against parallel computing because most real workers will have sharing mechanisms. If one quick worker finish beforehand the worker will not wait but go to work on other worker’s tasks. Thus saving time. So I guess the morale is this: Look at your task at hand. Maybe more than four threads are not worth it, if the task is indeed an embarrassingly parallel one divided into equal parts.

In case you’re wondering this is the Java code I used to test. Update: Thanks to a friend who misses nothing I’ve updated the code to include a pool as big as the number of jobs. The plot suddenly got a lot prettier. Thanks Torben. Here’s the code:

import java.util.*;
import java.util.concurrent.*;

public class ParallelJob implements Callable {
    private int currentCount;
    private int startCount;
    private int endCount;
    public ParallelJob(int startCount, int endCount) {
	this.currentCount = 0;
	this.startCount = startCount;
	this.endCount = endCount;
    }
    public Integer call() {
	for (int i = 0; i	    currentCount = i;
	    runEratosthenesSieve(1000);
	}
	return new Integer(currentCount);
    }
    public void runEratosthenesSieve(int upperBound) {
	int upperBoundSquareRoot = (int) Math.sqrt(upperBound);
	boolean[] isComposite = new boolean[upperBound + 1];
	for (int m = 2; m             if (!isComposite[m]) {
		//
		for (int k = m * m; k 		    isComposite[k] = true;
            }
	}
	for (int m = upperBoundSquareRoot; m 	    if (!isComposite[m]) {
		//
	    }
	}

    }
}

 

import java.util.*;
import java.util.concurrent.*;

public class TestOfParallel
{
    private ExecutorService pool;
    private Set<Future<Integer>> set;

    public TestOfParallel(int numberOfJobs) throws Exception {
	pool = Executors.newFixedThreadPool(numberOfJobs);
	set = new HashSet<Future<Integer>>();
	int sum = 0;
	long start = System.currentTimeMillis();
	runJobs(numberOfJobs);
	for (Future<Integer> currentFuture : set) {
	    sum += currentFuture.get();
	}

	System.out.println("The result is: " + sum);
	long stop = System.currentTimeMillis();
	System.out.println("Time lapsed: " + (stop - start));
	System.exit(sum);
    }

    public static void main(String [] args) throws Exception
    {

	System.out.println("Main started...");
	TestOfParallel testOfParallel = new TestOfParallel(Integer.parseInt(args[0]));
    }

    private void runJobs(int numberOfJobs) {
	for (int i = 0; i<numberOfJobs; i++) {
	    Callable<Integer> parallelJob = new ParallelJob(0, 1000000/numberOfJobs);
	    Future<Integer> future = pool.submit(parallelJob);
	    set.add(future);
	}
    }
}

Connecting jconsole to JBoss

To connect jconsole to a JVM running JBoss first you have to launch JBoss with a management agent. Like so:

$ ./run.sh -Dcom.sun.management.jmxremote

And to connect jconsole to the JBoss Java process you have to list ‘em all and then select the right process id. Like this:

$ jps -l
4028 org.jboss.Main
$ jconsole 4028

A million random digits

The strangest book ever written – to people not working with statistics or cryptography – is undoubtedly “A Million Random Digits with 100,000 Normal Deviates”.

The book title really says it all. The million numbers published in the book was produced by the Rand Corporation in 1947 using a custom roulette, and they’re still used today when someone needs good old-fashioned random numbers (not just today’s algorithm-produced replica).

The book was republished in 2001 and can be acquired at Amazon. It is worth a visit if only to read the reviews. If you just need the random numbers the Rand Corporation maintains a webpage, where you can read the introduction and download the million random numbers in txt format.

RIM open letter

The RIM open letter is very frank and sober advice. Advice that should be universally read and understood - not just by BlackBerry management but by everyone who wants to produce hardware and software for the end-user.