Installing BoofCV on Raspberry PI

Before I talk about getting BoofCV (a computer vision library) up and running, let’s talk about what is needed to get other computer vision libraries running on a Raspberry PI. Then I’ll go over building (not needed), installing (PyBoof technically), and running BoofCV.

Building OpenCV and Other Libraries

For the first time I’m doing actual development on a Raspberry PI and not playing video games. I decided to run a QR Code benchmark I wrote on a desktop computer and see how slow everything is on a Raspberry PI for fun and profit. I have a nifty Python script that builds and runs everything ready to go so this should be a 5 minute task right? Unfortunately this has been bringing back memories of the bad old days when nothing built would build the first time because your setup wasn’t exactly the same as the developers.

First task is to build ZBar. This library hasn’t been maintained in a while and can be tricky to build on a regular desktop. Getting it to compile without errors took an hour or so and who knows how many Google searches. Would have been worse if this was my first time building it and I didn’t have a script that patches the code to fix name conflicts already written.

Second task is installing OpenCV version 3.3.1. That exact version or newer is needed because it contains a fix for jpeg images. As far as I can tell only OpenCV 2.4 is available on apt-get or pip but you might be able to obtain other versions elsewhere, e.g. 3.1.0. Everyone seems to be building it from source. Took me four attempts and about 8 hours to get it working. Build failed a few times (unknown reason or small errors on my part) and built the wrong version of OpenCV once. Now that I know what I’m doing it would take 2 to 3 hours. Most of that is compiling the code.

For sake of being fare, not all libraries are this annoying. SimpleCV can be installed in a few lines using just apt-get according to their instructions.

Building BoofCV

There’s no need to build BoofCV. If you build it from source or download it automatically from Maven Central the code works just the same. It should build out of the box if you’re using Raspbian. If you’re running Ubuntu you need to install Oracle’s JDK because OpenJDK has broken encryption keys and Gradle can’t download the dependencies. If you’re dead set on building BoofCV from source make sure you’re connected to the internet and follow these instructions:

git clone -b master --recursive https://github.com/lessthanoptimal/BoofCV.git boofcv
cd boofcv
./gradlew install

That will download the source code and example data. Example data isn’t required but might be fun to play with. See readme for running examples.

Installing BoofCV

BoofCV is a Java library and you don’t really install it. At least not like a shared library or DLL. It’s either already included in the jar you’re running or you’re building a project and you reference the library using Gradle or Maven. Gradle/Maven will do the hard work for you by automatically downloading all the dependencies. Since this is a non-issue when used as a Java library let’s talk about PyBoof. PyBoof is a Python wrapper around BoofCV and you install that using pip, just like any other Python library.

python3 -m venv venv
source venv/bin/activate
pip install wheel

This should look familiar to almost all python developers. It just sets you up in a safe environment. Wheel is installed because it fixed an issue with the version of pip on my Raspberry PI. Then to install PyBoof just type in the command below. Numpy is a dependency and if you don’t have it installed already this will take a 10 minutes or so to build.

pip install PyBoof==0.29.post3

Now let’s open a image, look for a QR code, and print the results to standard out.

wget https://peterabeles.com/blog/wp-content/uploads/2018/05/qrcode.png

That will download the image for you. Next create a python script with the text below. When you run the script it will pause briefly while it processes the image then print out the results. For best performance process more than one image at a time. The first one is always sluggish.

import numpy as np
import pyboof as pb

# This is needed if you want the code to run fast
pb.init_memmap()

# Load an image as gray scale.=
gray = pb.load_single_band("qrcode.png", np.uint8)

# Declare the QR Code detector
detector = pb.FactoryFiducial(np.uint8).qrcode()

# Process the image, detect the QR Codes, and print the results
detector.detect(gray)
print("Detected a total of {} QR Codes".format(len(detector.detections)))
for qr in detector.detections:
    print("Message: "+qr.message)
    print(" at: "+str(qr.bounds))

Conclusion

BoofCV is very easy to get running on Raspberry PI. Large complex C/C++ projects will take several hours and multiple attempts. Giving you plenty of time to write a blog article.

Kinect in Java

The folks at OpenKinect did a good job at providing JNI wrappers for their Kinect driver.  What they didn’t do is provide a nice example showing how to process the byte stream data from the RGB and depth cameras.  As far as I can tell, even the comments in their c-code doesn’t fully describe the data format.  The OpenKinect wiki provides a bit more information but tends to be out of date in places and still didn’t fully describe the data format.  So off to google to find a working example in Java.  All I could find was a bunch of stuff telling people to use the Processing OpenKinect code.  That’s of little use to me, but I did find browsing their source code useful.

So for those of you who just want a straight forward example demonstrating how use OpenKinect in Java, you’ve come to the right place.  The code below does use BoofCV for some of the image processing and display, but that’s not essential and could be easily modified to not use BoofCV. 

OpenKinect Version: 0.1.2
BoofCV Version: 0.14

/**
 * Example demonstrating how to process and display data from the Kinect.
 *
 * @author Peter Abeles
 */
public class OpenKinectStreamingTest {

	{
		// Modify this link to be where you store your shared library
		NativeLibrary.addSearchPath("freenect", "/home/pja/libfreenect/build/lib");
	}

	MultiSpectral<ImageUInt8> rgb = new MultiSpectral<ImageUInt8>(ImageUInt8.class,1,1,3);
	ImageUInt16 depth = new ImageUInt16(1,1);

	BufferedImage outRgb;
	ImagePanel guiRgb;

	BufferedImage outDepth;
	ImagePanel guiDepth;

	public void process() {
		Context kinect = Freenect.createContext();

		if( kinect.numDevices() < 0 )
			throw new RuntimeException("No kinect found!");

		Device device = kinect.openDevice(0);

		device.setDepthFormat(DepthFormat.REGISTERED);

		device.setVideoFormat(VideoFormat.RGB);

		device.startDepth(new DepthHandler() {
			@Override
			public void onFrameReceived(FrameMode mode, ByteBuffer frame, int timestamp) {
				processDepth(mode,frame,timestamp);
			}
		});
		device.startVideo(new VideoHandler() {
			@Override
			public void onFrameReceived(FrameMode mode, ByteBuffer frame, int timestamp) {
				processRgb(mode,frame,timestamp);
			}
		});

		long starTime = System.currentTimeMillis();
		while( starTime+100000 > System.currentTimeMillis() ) {}
		System.out.println("100 Seconds elapsed");

		device.stopDepth();
		device.stopVideo();
		device.close();

	}

	protected void processDepth( FrameMode mode, ByteBuffer frame, int timestamp ) {
		System.out.println("Got depth! "+timestamp);

		if( outDepth == null ) {
			depth.reshape(mode.getWidth(),mode.getHeight());
			outDepth = new BufferedImage(depth.width,depth.height,BufferedImage.TYPE_INT_BGR);
			guiDepth = ShowImages.showWindow(outDepth,"Depth Image");
		}

		int indexIn = 0;
		for( int y = 0; y < rgb.height; y++ ) {
			int indexOut = rgb.startIndex + y*rgb.stride;
			for( int x = 0; x < rgb.width; x++ , indexOut++ ) {
				depth.data[indexOut] = (short)((frame.get(indexIn++) & 0xFF) | ((frame.get(indexIn++) & 0xFF) << 8 ));
			}
		}

		VisualizeImageData.grayUnsigned(depth,outDepth,1000);
		guiDepth.repaint();
	}

	protected void processRgb( FrameMode mode, ByteBuffer frame, int timestamp ) {
		if( mode.getVideoFormat() != VideoFormat.RGB ) {
			System.out.println("Bad rgb format!");
		}

		System.out.println("Got rgb! "+timestamp);

		if( outRgb == null ) {
			rgb.reshape(mode.getWidth(),mode.getHeight());
			outRgb = new BufferedImage(rgb.width,rgb.height,BufferedImage.TYPE_INT_BGR);
			guiRgb = ShowImages.showWindow(outRgb,"RGB Image");
		}

		ImageUInt8 band0 = rgb.getBand(0);
		ImageUInt8 band1 = rgb.getBand(1);
		ImageUInt8 band2 = rgb.getBand(2);

		int indexIn = 0;
		for( int y = 0; y < rgb.height; y++ ) {
			int indexOut = rgb.startIndex + y*rgb.stride;
			for( int x = 0; x < rgb.width; x++ , indexOut++ ) {
				band2.data[indexOut] = frame.get(indexIn++);
				band1.data[indexOut] = frame.get(indexIn++);
				band0.data[indexOut] = frame.get(indexIn++);
			}
		}

		ConvertBufferedImage.convertTo_U8(rgb,outRgb);
		guiRgb.repaint();
	}

	public static void main( String args[] ) {
		OpenKinectStreamingTest app = new OpenKinectStreamingTest();

		app.process();
	}
}

BoofCV Android Demo Application

I recently wrote an application to demonstrate some of the capabilities BoofCV on Android.  BoofCV is an open source computer vision library that I’m working on written entirely in Java.  The v0.13 update to BoofCV added functions for converting NV21 images (camera preview format) into BoofCV data types and provided Android specific visualization code.  The end result is that it is now easier to write fast real-time image processing applications on Android using BoofCV.  The source code for the demo application has also be released without restrictions.

General Tips

  • Cell phone cameras are poor quality and suffer from motion blur and rolling shutters.
  • Everything will work better when viewed with good lighting, allowing for faster shutter speeds.
  • When selecting images for association you can swipe to remove previous selection.  Also try tapping and double tapping.
  • Image mosaic and stabilization work best when viewing far away objects, pure rotations, and planar surfaces.
  • When tapping the screen to capture an image, do so gently or else the image will be blurred.
  • On slower devices, pressing the back button to leave a slow process can crash the app and require you to manually kill it.
  • Changing the camera preview size to “large” images can cause out of memory exceptions.

Camera Calibration

Detected square grid pattern.  Red dots show calibration points.

Detected square grid pattern. Red dots show calibration points.

Camera calibration is required for 3D vision applications.  It will allow radial lens distortion to be removed and other intrinsic camera parameters to be known.  To calibrate the camera print out a calibration grid and follow the general instructions from the link below.  Note that the printed image can be rescaled for this application.

http://boofcv.org/index.php?title=Tutorial_Camera_Calibration

After you have printed the calibration target, take pictures of it (by gently tapping the screen) from at least 5 different views and angles.  Most of the pictures should be taken from about 30 degree angle.  When you are done taking pictures, press the “Compute” button to find the intrinsic parameters.  When taking pictures it is recommended that you are sitting down on the ground holding your Android device with both hands.  This is more stable and reduces motion blur, greatly reducing the frustration factor.  Try to fill as much of the screen as possible with the calibration target and if one looks like it might be blurred click the remove button.  On my Sampson Galaxy S3 and Droid 2 I get about 0.2 pixels mean error on good runs and more than a pixel error when things go badly.

Stereo

android_stereo

Stereo depth image computed from two views of the same object. Warmer colors indicate closer objects and color farther away.

Stereo vision allows the scene’s 3D structure to be computed.  This is accomplished using a single camera using taking pictures of the same object two times from two different points of view.  Camera calibration is required before stereo vision can be computed since lens distortion must be removed.  On a Droid 2 cell phone (phone from around 2010/2011) the whole process can take 25 seconds or so, but on a Galaxy S3 (phone from 2012) it only takes four seconds.

To compute a stereo depth image first tap the screen to take a picture. Then move the camera left or right with a little rotation, up/down, and forwards/backwards motion as possible.  How far you should move the camera depends on how far away the objects are.  For close objects 1cm is often good and for objects a few feet away (1 meter) 3cm/1inch works well.  Moving the camera too much tends to be a more common mistake than moving it too little.  It will probably take you a few tries to get a good image.  Also make sure the scene has good texture or else it won’t work.

 Video Mosaic

Mosaic created by moving the camera over a flat table top.

Mosaic created by moving the camera over a flat table top.

Creating a good mosaic can be difficult, with motion blur being the primary culprit.  There is a reason why image stitching software on Android devices use still images.  However with some practice and the right environment you can get some good results.  See the image above.

Remember that it won’t work if you are close up to a complex 3D scene and translating the camera.  Instead try pivoting the camera or working off of far away objects.  Slowly moving the camera also helps.

Inverse Radial Distortion Formula

Where are the inverse equations hiding?

A common problem in computer vision is modelling lens distortion.  Lens distortion distort the shape of objects and introduce large errors in structure from motion applications.  Techniques and tools for calibrating cameras and removing lens distortion are widely available.  While books and papers readily provide the forward distortion equations (given an ideal undistorted coordinate, compute the distorted coordinate) inverse equations are much harder to come by.

Turns out there is no analytic equation for inverse radial distortion.  Which might explain the mysterious absence of inverse equations, but still it would be nice if this issue was discussed in common references. First a brief summary of background equations is given, followed by the solution to the inverse distortion problem.

Inverse Distortion Problem: Given a distorted image coordinate and distortion parameters, determined the coordinate of the ideal undistorted coordinate.

Background

Unlike the idealized pin-hole camera model, real world cameras have lens distortions.  Radial distortion is a common model for lens distortion and is summarized by the following equations:

\hat{x} = x + x[k_1 r^2 + k_2 r^4]
\hat{u} = u + (u-u_0)[k_1 r^2 + k_2 r^4]
r^2=u_x^2 + u_y^2

where \hat{u}=[\hat{u}_x,\hat{u}_y] and u are the observed distorted and ideal undistorted pixel coordinates, \hat{x} and x are the observed and ideal undistorted normalized pixel coordinates, and k_1, k_2 are radial distortion coefficients.  The relationship between normalized pixel and unnormalized pixel coordinates is determined by the camera calibration matrix, u=C x, where C is the 3×3 calibration matrix.

Solution

While there is no analytic solution there is an iterative solution.  The iterative solution works by first estimating the radial distortion’s magnitude at the distorted point and then refining the estimate until it converges.

  1. x=C^{-1}\hat{u}
  2. do until converge:
    1. r=||x||^2
    2. m=k_1 r^2 + k_2 r^4
    3. x=\frac{C^{-1}\hat{u}}{1+m}
  3. u = \frac{x+x_0 m}{1+m}

Where x_0 is the principle point/image center, and ||x|| is the Euclidean norm. Only a few iterations are required before convergence.  For added speed when applying to an entire image the results can be cached.  A similar solution can be found by adding in terms for tangential distortion.