Wednesday, April 22, 2009

A little irony with NodeBox

Thursday, April 16, 2009

Maxine JIT vs. Hotpath vs. Hotspot

Here are the latest benchmark numbers for JavaGrande Section 2. There are still some issues running more complicated benchmarks but I'm working on that. Maxine uses a lightweight JIT compiler and an optimizing compiler (OPT). These numbers only refer to the JIT because the OPT is currently broken in my developer build.





The GC is having some locking issues at the moment and I had to turn it off. However, these benchmarks don't do any allocation, so the lack of GC shouldn't skew the results too much.

Tuesday, April 14, 2009

Chasing Cycles Part 1

This is the first part in a series of blog posts looking at some of the practical aspects of Trace Compilation. There are lots of gotchas when building a dynamic compilation system, and often times there are no pretty solutions. The recurring theme, as far as I can tell, is that compilation is always a gamble. Will I save enough cycles in the long run, to pay off the cycles I wasted on compilation. Luck is hard to come by, we need to rely on heuristics and experimental data to make educated guesses.

Here is a question, should we compile the inner loop below? Lets look at some numbers...

for (int k = 4; k <= 12; k++) {
int iterations = 1024 << k;
for (int j = 0; j < iterations; j++) {
sum += j;
}
}


I've modified the compiler to trigger a recompilation for each execution of the inner loop. The graph below shows execution times (in microseconds) for different workloads of the inner loop. Each bar cluster, has three bars. The first bar is the total execution time of the inner loop using the Trace Compiler (compilation cost + execution cost), the second bar excludes compilation cost, the last bar is the execution cost using the JIT compiler.




Compilation time hovers around 5ms, with an initial cost 0f 10ms most likely due to compiler initialization cost. Execution time of trace compiled code is roughly 10x faster than that of JIT code. Unfortunately, the 10x speedup doesn't pay for the compilation cost unless we do a significant amount of work in the inner loop, more than 500K iterations.


Things to notice:
  • We can reduce the compilation cost, but in this case it would be a monumental task to get compilation time low enough to be paid off by the inner loop iterating less than 100K times.
  • If we had a godly optimizing compiler, and the output produced still wouldn't pay back for compilation time.
  • Probably the best thing to do in this case is to prevent the trace compiler from doing anything unless the loop iterates 500K times.
  • Golden rule: Compile things that can pay off compilation time! Function calls, complicated code, etc...
Here we only looked at compiling a very simple counting loop, which can be executed quite efficiently by the JIT, next time we'll focus on more complicated code. We'll see that once we start doing more work in the inner loop, the easier it becomes to pay back compilation time.

Also, those of you that may question my time measurement methodolgy, rest assured that I'm not just using System.nanoTime(). I'm using a combination of RDTSC implemented as compiler intrinsic in Maxine and System.nanoTime() to get an appropriate timing scale. RDTSC has its own problems, but measuring time on a single thread seems to work quite well for me.

Maxine - Hotpath - Big Picture

We presented some of our work at CGO in Seattle this year, so without further ado ...



The poster focuses on the challenges faced when building a Trace Compiler for non-interpreted execution environments, as is the case with Maxine.

Plotter

I've been scouring the web for a script to generate good looking bar charts. Unfortunately, it seems that technology has not advanced to the point where drawing a simple bar chart is a simple endeavor. So, setting other more pressing issues aside, I decided to waste my weekend writing my very own clustered bar chart script. After some Python and NodeBox hacking, voila ...



NodeBox also has this neat feature where you can create animations ...

video

The primary reason for doing this was to get quick turnaround between visualization and data generation. Most charting tools make this process quite tedious, and therefore useless for me.

Monday, April 13, 2009

Switching to Blogger

I finally gave up tying to host my own blog. I followed Mason's advice and switched over to Blogger, for the 3rd time. This time I think I'll stick with it, I'm just waiting for my domain to switch over.

Blog Archive