Deer Detection with Machine Learning Part 1

By Craig Thomas, Mon 04 August 2014, in category Wildlife detection

deer, machine_learning

So, I have a deer who visits my yard. The deer wants to murder me and eat all of my vegetables. Well, the deer probably doesn't really want to murder me, but it probably does want to eat all of the vegetables that we have in our garden. When we originally moved here, there were two deer that would frequent our yard just about every day. They aren't really a problem - more of an unexpected annoyance when we want to pick apples or take out the compost.

This is the deer. It is looking at me like it wants to murder me.

The Deer

The deer are not afraid of us either. If we come out into the yard, they will just stand there (or sit there) looking at us like we're pond scum. I'm not sure why they seem to loathe us so much - it probably has something to do with the deer fence we have around our garden.

Keeping an Eye Out

When I snapped the picture above and posted it to Facebook, I got some interesting responses back (you know who you are AP!). One of them was about a gentleman who used machine learning to recognize squirrels, and fire a water gun at them. I don’t want to fire anything at the deer in my yard, I’m just curious about what exactly they are doing and when they are hanging around.

Given that the deer enter the yard somewhat infrequently, I want some method that will detect when the deer is present, and snap some photos - or even a video - of what they are doing, and then turn off when they leave. Placing a camera to take pictures of the back yard is relatively simple. For the actual deer detection, I decided to use a little machine learning. Essentially, I want the computer to sort through thousands of images or hours of video looking for the deer so that I don't have to. Think of it as a Gorilla Detector, but for deer (since we all know about the dangers of undetected gorillas).

Collecting Data with the Kinect

I'm going to be using a simple supervised machine learning technique. With supervised machine learning, to teach a computer to learn when a deer is in a picture, I need to feed the computer pictures that have deer and pictures that don't have deer. So, step 1 in the project is to collect pictures of the deer.

Remember how I said up above I didn't want to sort through thousands of pictures to see if there is a deer in them? Well, unfortunately, that’s what I have to do to get my training examples to teach the computer (lousy deer – yes, I blame you!). But, if all goes well, it shouldn't take too long.

To take pictures, I decided to use an XBOX 360 Kinect. Why? Well, I’m hoping to use the IR camera to take pictures of the deer at night. Plus, the OpenKinect project has some nice drivers available for many platforms, as well as wrappers for many programming languages. It is relatively easy to capture either a single frame, or a video using the open source drivers. And the Kinect is just plain cool (it has a LED you can turn on and off!).

Installing OpenKinect Drivers

Installing the necessary drivers under Ubuntu 14.04 is relatively simple. It was a matter of using apt-get to install the necessary packages:

sudo apt-get install freenect

Then, I plugged in the Kinect. Running dmesg showed me that the kernel successfully recognized the Kinect:

[34089.811775] usb 2-2.3: New USB device found, idVendor=045e, idProduct=02ae
[34089.811782] usb 2-2.3: New USB device strings: Mfr=2, Product=1, SerialNumber=3
[34089.811786] usb 2-2.3: Product: Xbox NUI Camera
[34089.811789] usb 2-2.3: Manufacturer: Microsoft

Excellent! Step one of my ridiculously circuitous plan was complete.

Testing Out the Kinect

The next step was to ensure that I could actually acquire data from the Kinect sensors. I hopped over to the OpenKinect GitHub account, and checked out whether they had any sample programs. Sure enough, their wrapper classes had some examples for grabbing video and pictures. Their Python examples looked simple enough, so I decided to try them out.

First, I cloned their repo:

git clone https://github.com/OpenKinect/libfreenect

Then, I installed some necessary Python packages:

sudo apt-get install python-freenect
sudo apt-get install python-opencv

From there, it was a simple matter of running their demo to grab both an RGB image, as well as an infrared depth image:

cd libfreenect/wrapper/python
./demo_cv_sync.py

Here is an example of an infrared depth image of me waving at the camera. Hi!

IR Image Example

Enter Java

With the Python examples demonstrating that the Kinect works, the next step was to build a simple image capture program. I decided to write it in Java.

Why Java? Well, for one, I could build a fat JAR for my program containing all of the components necessary to actually run the program (the OpenKinect wrappers are distributed under an Apache 2 License – very much appreciated!). Plus, when it comes time to actually crunch data with the machine learning components, I want something that will execute relatively fast, and Python - while great for prototyping - doesn't usually offer performance guarantees. So, Java it is.

The first step was to package the OpenKinect Java wrapper into a JAR. Here is where I ran into problems. Building the wrapper was supposed to be as simple as executing a Maven package command. For me however, the unit tests kept generating segfaults outside of the JVM. Being brave, I just turned off the unit tests, and crossed my fingers that the resultant JAR was usable:

cd libfreenect/wrappers/java
mvn -Dmaven.test.skip=true package

Hopefully this won't make the deer explode. I then copied the resultant JAR to my library path, and updated Gradle to include it in the compile time dependencies.

Update August 7, 2014: the BoofCV project has the libfreenect wrappers built into it - best of all, they have a Maven Central repository for their libraries. I've updated my source code and Gradle script to use BoofCV instead of my locally built JAR (the deer definitely can't explode now – and in case you didn't get the humor, there never was a chance that they could!).

The next step was to write a class that would manage the flow of data from the Kinect. I created a simple Monitor class to create a connection to the device in the constructor:

public Monitor() {
    mContext = Freenect.createContext();
    if (mContext.numDevices() == 0) {
        throw new IllegalStateException("no Kinect sensors detected");
    }
    mDevice = mContext.openDevice(0);
}

It has a single function called takeSnapshot that will actually turn on the device, and take a picture with it:

public VideoFrame takeSnapshot() throws InterruptedException {
    mDevice.setVideoFormat(VideoFormat.RGB);
    mVideoHandler = new VideoFrameHandler();
    mDevice.setLed(LedStatus.RED);
    mDevice.startVideo(mVideoHandler);
    while (mVideoHandler.getVideoFrame() == null) {
        Thread.sleep(100);
    }
    VideoFrame videoFrame = mVideoHandler.getVideoFrame();
    mDevice.stopVideo();
    mDevice.setLed(LedStatus.OFF);        
    return videoFrame;
}

The real magic is performed by the VideoHandler interface. When you call mDevice.startVideo, it needs a class that will handle the set of video frames that the Kinect generates. The only function that you are required to implement is the onFrameReceived function. My VideoHandler class simply stores the last frame of information sent from the Kinect. Any newer frames will overwrite the old ones - this is mostly because I'm lazy. I don't need a sequence of frames, or even exact pictures at a point in time - any frame within say a second of taking a snapshot is good enough for me. This makes my handler very simple. I just store the information that the Kinect sends:

public void onFrameReceived(FrameMode arg0, ByteBuffer arg1, int arg2) {
    mVideoFrame = new VideoFrame(arg0, arg1, arg2);
}

My VideoFrame class is nothing more than a container class that stores the FrameMode, ByteBuffer and timestamp (arg2). The ByteBuffer and the FrameMode are the keys for actually displaying the information you get back from the Kinect. The ByteBuffer has the raw bytes received from the Kinect. The FrameMode on the other hand, holds information relating to the image width, height and color depth. Using these pieces of information, it’s relatively easy to reconstruct the actual image. In my case, I created a function in my VideoFrame class that generates a BufferedImage:

public BufferedImage getBufferedImage() {
    int width = sFrameMode.width;
    int height = sFrameMode.height;
    byte[] data = new byte[sByteBuffer.remaining()];
    sByteBuffer.get(data);

    mBufferedImage = new BufferedImage(width, height, 
            BufferedImage.TYPE_INT_RGB);

    ByteArrayInputStream stream = new ByteArrayInputStream(data);

    for (int y = 0; y < height; y++) {
        for (int x = 0; x < width; x++) {
            int r = stream.read();
            int g = stream.read();
            int b = stream.read();
            mBufferedImage.setRGB(x, y, new Color(r, g, b).getRGB());
        }
    }
    return mBufferedImage;
}

The code is quite simple. I’ve already asked the Kinect to generate RGB data from the camera back in the Monitor class. The ByteBuffer therefore has a sequence of R, G, B values in it - a triplet for every pixel in the image.

To make it easy to read, I converted the ByteBuffer into a ByteArrayInputStream so I could easily call read to get the next byte. All that was left was to get the height and width of the image that is stored in the FrameMode data, loop around the stream reading R, G, B data and write it into the BufferedImage. Note however, that if I had asked the Kinect to generate an image in a different color format that I would have had to do something different in the getBufferedImage function.

Putting it All Together

With my simple monitor class complete, all that was required was an option parser to set things like how many shots to take, and the time delay between them. The result is a command line Java program that will take pictures over given time intervals (the source code is available on my GitHub account). Cue evil laugh here.

With the program in hand, I mounted the Kinect in our yard facing window, and fired it up. I’m going to take a snapshot every 10 seconds, meaning that I’ll grab 6 pictures per minute. Since I also want to make sure that it detects deer, not just anything strange in the backyard, we’re also going to throw a few shots of insanity in. For example, this is not a deer:

Not a Deer

This is also not a deer:

Also not a Deer

Still not a deer:

Still not a Deer

Conclusions

I don't have a degree in deer psychology, so I can't speculate as to why the deer harbors such hatred of me. I can, however, keep an eye on it to make sure it doesn't build weapons of mass destruction in our backyard while we aren't looking (we are a part of a block watch program, as such, deer-built WMDs are generally frowned upon). Tune in again in about 2 weeks when I will have the machine learning component of the project complete, as well as some preliminary data analysis of the backyard images.