Kinect in Java

The folks at OpenKinect did a good job at providing JNI wrappers for their Kinect driver.  What they didn’t do is provide a nice example showing how to process the byte stream data from the RGB and depth cameras.  As far as I can tell, even the comments in their c-code doesn’t fully describe the data format.  The OpenKinect wiki provides a bit more information but tends to be out of date in places and still didn’t fully describe the data format.  So off to google to find a working example in Java.  All I could find was a bunch of stuff telling people to use the Processing OpenKinect code.  That’s of little use to me, but I did find browsing their source code useful.

So for those of you who just want a straight forward example demonstrating how use OpenKinect in Java, you’ve come to the right place.  The code below does use BoofCV for some of the image processing and display, but that’s not essential and could be easily modified to not use BoofCV. 

OpenKinect Version: 0.1.2
BoofCV Version: 0.14

 * Example demonstrating how to process and display data from the Kinect.
 * @author Peter Abeles
public class OpenKinectStreamingTest {

		// Modify this link to be where you store your shared library
		NativeLibrary.addSearchPath("freenect", "/home/pja/libfreenect/build/lib");

	MultiSpectral<ImageUInt8> rgb = new MultiSpectral<ImageUInt8>(ImageUInt8.class,1,1,3);
	ImageUInt16 depth = new ImageUInt16(1,1);

	BufferedImage outRgb;
	ImagePanel guiRgb;

	BufferedImage outDepth;
	ImagePanel guiDepth;

	public void process() {
		Context kinect = Freenect.createContext();

		if( kinect.numDevices() < 0 )
			throw new RuntimeException("No kinect found!");

		Device device = kinect.openDevice(0);



		device.startDepth(new DepthHandler() {
			public void onFrameReceived(FrameMode mode, ByteBuffer frame, int timestamp) {
		device.startVideo(new VideoHandler() {
			public void onFrameReceived(FrameMode mode, ByteBuffer frame, int timestamp) {

		long starTime = System.currentTimeMillis();
		while( starTime+100000 > System.currentTimeMillis() ) {}
		System.out.println("100 Seconds elapsed");



	protected void processDepth( FrameMode mode, ByteBuffer frame, int timestamp ) {
		System.out.println("Got depth! "+timestamp);

		if( outDepth == null ) {
			outDepth = new BufferedImage(depth.width,depth.height,BufferedImage.TYPE_INT_BGR);
			guiDepth = ShowImages.showWindow(outDepth,"Depth Image");

		int indexIn = 0;
		for( int y = 0; y < rgb.height; y++ ) {
			int indexOut = rgb.startIndex + y*rgb.stride;
			for( int x = 0; x < rgb.width; x++ , indexOut++ ) {[indexOut] = (short)((frame.get(indexIn++) & 0xFF) | ((frame.get(indexIn++) & 0xFF) << 8 ));


	protected void processRgb( FrameMode mode, ByteBuffer frame, int timestamp ) {
		if( mode.getVideoFormat() != VideoFormat.RGB ) {
			System.out.println("Bad rgb format!");

		System.out.println("Got rgb! "+timestamp);

		if( outRgb == null ) {
			outRgb = new BufferedImage(rgb.width,rgb.height,BufferedImage.TYPE_INT_BGR);
			guiRgb = ShowImages.showWindow(outRgb,"RGB Image");

		ImageUInt8 band0 = rgb.getBand(0);
		ImageUInt8 band1 = rgb.getBand(1);
		ImageUInt8 band2 = rgb.getBand(2);

		int indexIn = 0;
		for( int y = 0; y < rgb.height; y++ ) {
			int indexOut = rgb.startIndex + y*rgb.stride;
			for( int x = 0; x < rgb.width; x++ , indexOut++ ) {[indexOut] = frame.get(indexIn++);[indexOut] = frame.get(indexIn++);[indexOut] = frame.get(indexIn++);


	public static void main( String args[] ) {
		OpenKinectStreamingTest app = new OpenKinectStreamingTest();


BoofCV Android Demo Application

I recently wrote an application to demonstrate some of the capabilities BoofCV on Android.  BoofCV is an open source computer vision library that I’m working on written entirely in Java.  The v0.13 update to BoofCV added functions for converting NV21 images (camera preview format) into BoofCV data types and provided Android specific visualization code.  The end result is that it is now easier to write fast real-time image processing applications on Android using BoofCV.  The source code for the demo application has also be released without restrictions.

General Tips

  • Cell phone cameras are poor quality and suffer from motion blur and rolling shutters.
  • Everything will work better when viewed with good lighting, allowing for faster shutter speeds.
  • When selecting images for association you can swipe to remove previous selection.  Also try tapping and double tapping.
  • Image mosaic and stabilization work best when viewing far away objects, pure rotations, and planar surfaces.
  • When tapping the screen to capture an image, do so gently or else the image will be blurred.
  • On slower devices, pressing the back button to leave a slow process can crash the app and require you to manually kill it.
  • Changing the camera preview size to “large” images can cause out of memory exceptions.

Camera Calibration

Detected square grid pattern.  Red dots show calibration points.

Detected square grid pattern. Red dots show calibration points.

Camera calibration is required for 3D vision applications.  It will allow radial lens distortion to be removed and other intrinsic camera parameters to be known.  To calibrate the camera print out a calibration grid and follow the general instructions from the link below.  Note that the printed image can be rescaled for this application.

After you have printed the calibration target, take pictures of it (by gently tapping the screen) from at least 5 different views and angles.  Most of the pictures should be taken from about 30 degree angle.  When you are done taking pictures, press the “Compute” button to find the intrinsic parameters.  When taking pictures it is recommended that you are sitting down on the ground holding your Android device with both hands.  This is more stable and reduces motion blur, greatly reducing the frustration factor.  Try to fill as much of the screen as possible with the calibration target and if one looks like it might be blurred click the remove button.  On my Sampson Galaxy S3 and Droid 2 I get about 0.2 pixels mean error on good runs and more than a pixel error when things go badly.



Stereo depth image computed from two views of the same object. Warmer colors indicate closer objects and color farther away.

Stereo vision allows the scene’s 3D structure to be computed.  This is accomplished using a single camera using taking pictures of the same object two times from two different points of view.  Camera calibration is required before stereo vision can be computed since lens distortion must be removed.  On a Droid 2 cell phone (phone from around 2010/2011) the whole process can take 25 seconds or so, but on a Galaxy S3 (phone from 2012) it only takes four seconds.

To compute a stereo depth image first tap the screen to take a picture. Then move the camera left or right with a little rotation, up/down, and forwards/backwards motion as possible.  How far you should move the camera depends on how far away the objects are.  For close objects 1cm is often good and for objects a few feet away (1 meter) 3cm/1inch works well.  Moving the camera too much tends to be a more common mistake than moving it too little.  It will probably take you a few tries to get a good image.  Also make sure the scene has good texture or else it won’t work.

 Video Mosaic

Mosaic created by moving the camera over a flat table top.

Mosaic created by moving the camera over a flat table top.

Creating a good mosaic can be difficult, with motion blur being the primary culprit.  There is a reason why image stitching software on Android devices use still images.  However with some practice and the right environment you can get some good results.  See the image above.

Remember that it won’t work if you are close up to a complex 3D scene and translating the camera.  Instead try pivoting the camera or working off of far away objects.  Slowly moving the camera also helps.

Inverse Radial Distortion Formula

Where are the inverse equations hiding?

A common problem in computer vision is modelling lens distortion.  Lens distortion distort the shape of objects and introduce large errors in structure from motion applications.  Techniques and tools for calibrating cameras and removing lens distortion are widely available.  While books and papers readily provide the forward distortion equations (given an ideal undistorted coordinate, compute the distorted coordinate) inverse equations are much harder to come by.

Turns out there is no analytic equation for inverse radial distortion.  Which might explain the mysterious absence of inverse equations, but still it would be nice if this issue was discussed in common references. First a brief summary of background equations is given, followed by the solution to the inverse distortion problem.

Inverse Distortion Problem: Given a distorted image coordinate and distortion parameters, determined the coordinate of the ideal undistorted coordinate.


Unlike the idealized pin-hole camera model, real world cameras have lens distortions.  Radial distortion is a common model for lens distortion and is summarized by the following equations:

\hat{x} = x + x[k_1 r^2 + k_2 r^4]
\hat{u} = u + (u-u_0)[k_1 r^2 + k_2 r^4]
r^2=u_x^2 + u_y^2

where \hat{u}=[\hat{u}_x,\hat{u}_y] and u are the observed distorted and ideal undistorted pixel coordinates, \hat{x} and x are the observed and ideal undistorted normalized pixel coordinates, and k_1, k_2 are radial distortion coefficients.  The relationship between normalized pixel and unnormalized pixel coordinates is determined by the camera calibration matrix, u=C x, where C is the 3×3 calibration matrix.


While there is no analytic solution there is an iterative solution.  The iterative solution works by first estimating the radial distortion’s magnitude at the distorted point and then refining the estimate until it converges.

  1. x=C^{-1}\hat{u}
  2. do until converge:
    1. r=||x||^2
    2. m=k_1 r^2 + k_2 r^4
    3. x=\frac{C^{-1}\hat{u}}{1+m}
  3. u = \frac{x+x_0 m}{1+m}

Where x_0 is the principle point/image center, and ||x|| is the Euclidean norm. Only a few iterations are required before convergence.  For added speed when applying to an entire image the results can be cached.  A similar solution can be found by adding in terms for tangential distortion.