Deer Detection with Machine Learning (Part 4)

It’s time again for another episode of “Murderous Deer Machine Learning Mayhem”. Last time, I used a neural network to detect raccoons, since they frequented my backyard more often in the early days. In that post, my preliminary results revealed an accuracy (F1 measure) of approximately 72%. In this post, I’ll talk about some of the experiments I performed to try and increase the overall effectiveness of the neural network.

Seeing in Color

My first observation I thought was fairly obvious – color information should lead to better performance. This is an example where my gut feeling and a quick and dirty exploration of the data set seemed to point in a promising direction. Notice the word seemed – that’s because I ultimately went down a bad path. This is a perfect example of why you should dig into the data much more – and have a better grounding in the domain – before investing too much time into feature engineering. Past-self was warning current-self about this in previous blog posts here and here. Bad Craig for not listening, bad! But before I reveal why this didn’t work, let me describe how I adapted the architecture of the neural net to see in color.

Changing the Network

Background – one type of computer image ultimately boils down to a combination of three different color components – Red, Green and Blue (RGB). By mixing these components together with varying intensities of each, you can produce a wide variety of colors. For example, if each component can take on an intensity value between 0 and 255, you can produce 16,777,216 different colors. Cool!

My idea for the neural  network was simple – break down each pixel into its component values, and feed all of that to the neural network. A picture will probably describe this much better. My original network looked like this, where I took the average of the RGB intensities and stored it in a single value (creating a grayscale image):

network_diagram

I modified the network (by modify, I mean built in an additional option to the program) to subdivide each pixel into its R, G and B component, and fed that to the neural network like so:

network_diagram_color

This means that I now had 3,600 x 3 = 10,800 inputs to the neural network. A bit of a combinatorial explosion in the number of network connections, but heck, memory and computing power is cheap these days.

Some Experiments

I ran quite a number of experiments on the raccoon data with more control on how testing and training data sets were built, both to re-affirm my original results (which hold, ~70% accuracy is right on the button), and to try out a few different options, which included color information. I’ll explain the different options in a moment, but first, here are some results.  Note, these results are after training the network over 1,500 iterations to convergence, and then performing a 10-fold cross validation.

Precision

Precision (also known as the positive predictive value) is a measurement of how well the machine learning algorithm is doing when it labels something as a positive example. You basically look at all of the things that the classifier labels as a raccoon, and see how many of them are actually raccoons. High precision is good – it means the algorithm is accurately identifying positive examples.

color-raccoons-precision

In the case of the Baseline, you can see that it does fairly well – roughly 70% of the things it labelled a raccoon were actually raccoons. This is good. Notice however, that the Hyperbolic did better (more on that in a bit). Notice too that the precision with Color information was significantly lower.

Recall

Recall (also known as sensitivity) measures how well the algorithm is remembering the actual examples it was given (I always think of the movie Total Recall – the original 90’s version with the Governator). Put another way, recall asks: of all of things that were actually raccoons, how many did it properly identify? As an example, if there are 10 raccoons in the data set, and the algorithm correctly identifies 6 raccoons, then the recall is 60% (total recall would be identifying all 10 raccoons).

color-raccoons-recall

As we can see, the Baseline recall is roughly 70%, with the Hyperbolic performing equally. Again, it took a hit with Color, but this time, the Hyperbolic Color does better. Notice the variance between the different options (the black lines that indicate a range). Both Color and Hyperbolic Color have quite significant variances. This means that those models are fairly fragile.

F1 Measure

The F1 measure is a way of combining both the Precision and Recall of a model into a single measurement (also known as a harmonic mean). Why? Well, sometimes it’s nice to be able to judge performance based on a single feature, rather than comparing the features separately. In a previous post, I used the F1 measure and accuracy to mean the same thing.

color-raccoons-f1

As you can see here, the Hyperbolic wins the race for overall performance. Let’s talk about what is happening here.

The Hyperbolic Tangent

In a previous post, I talked about how neural networks work. One of the key features is the activation function for the network. When a signal is passed from one end of the network to the other, it must pass through several layers in the network. The signal must be a certain “strength” before it is passed on. The thing that measures the strength and “decides” whether to pass on a new signal is the activation function.

The Hyperbolic Tangent is an activation function with a slightly different profile than the one I was using in the Baseline model. The Baseline model uses the Sigmoid as an activation function. Here is a plot of how the two functions stack up with some values between -10 and 10:

sigmoid_tanh

As you can see, the Hyperbolic Tangent maps values between -1 and 1, and is triggered slightly differently than the Sigmoid.

Long story short, recall for both the Hyperbolic Tangent and Sigmoid models were basically identical. The difference was with precision. Basically, the Hyperbolic Tangent resulted in a model that was more precise than the Sigmoid model, which is why the F1 measure is slightly better (because it is a combination of both precision and recall). Note that both the Baseline and Hyperbolic models were produced using grayscale images. So, what is going on with color?

Color Made Things Worse

Yup. That’s right. Performance tanked with color information (okay, it’s not abysmal, but it is considerably worse than my original run with grayscale images). At first I thought I had an error somewhere in my code. But looking at the data more, as well as false positives and false negatives, I started to hypothesize that the way I was using color information was probably hurting me very badly. The reason – I think – is due to the fact that I’m losing shape information when I separate out each color band.

To explain a bit more, in both the Baseline and Hyperbolic (without Color) models, for each pixel in each image, I take an average of all the different color bands. This does two things:

  1. It converts the color image into a grayscale image.
  2. It normalizes and accentuates edges.

Point 2 I think is the important part. To demonstrate this a bit more clearly, I looked at the following false negative (what was a raccoon, but what the algorithm thought was not):

fn37

I separated it into its R, G, and B components and looked at the intensity of each component.

Here is the R component:

red_fn37

 

Here is the G component:

green_fn37

 

Here is the B component:

blue_fn37

Notice how difficult it is to make out shapes when you treat each color band separately (especially the blue component). The raccoon tends to blend in with the background. Aside from the white on its face and front, it is very difficult to make it out. For a comparison, here is what the picture looks like if I convert it to a proper grayscale:

i-fn37

Notice how it becomes much easier to separate the actual shape of the raccoon out of the background.

The long and the short of it is while I thought I was giving the neural network more data to work with, in actual fact, I was making it harder for the network to distinguish shapes. I need to do some reading to find a new strategy for dealing with color information. Feel free to reply in the comments section if you have an idea for an approach!

In the Meantime – Deer!

With some preliminary analysis out of the way, and with a functioning neural network, I think it is finally time to perform deer detection. I now have over 5,500 images of deer to train with. It will take some time to process that many images in order to generate a training set. Half of those images are taken during the night, and half during the day. Many of them are of the deer laying down, like so:

deer_laying_downNotice that pesky post in the way of the deer! Plus, eventually those deer are going to lose their antlers (ha-ha – one less weapon with which to murder me). All in all, there are going to be some unique challenges to identifying deer.

Wrapping Up

In this post, I talked about the performance of the neural network with respect to identifying raccoons, and talked about why color information didn’t work out as I originally planned. Tune in next time when I’ll look at identifying deer and some of the challenges that brings.

Deer Detection with Machine Learning (Part 3)

2014-08-28_04:31:31.827161

This is it! This is the post where I talk about the Machine Learning component in my Deer Detector. Have no idea what I’m talking about? Check out my other blog posts here and here for some background information about the project.

Interested in how some machine learning algorithms work? Check out one of my Deer Detection Diversion posts on the subject here.

Ready? Let’s go!

Evidence of Deer (Finally!)

First off, some good news – I managed to capture some pictures of the deer in my backyard. In the event of my mysterious death (and my empty garden), I have some interesting pictures as evidence that a deer may be involved:

2014-08-28_04:31:31.827161

Ah-ha! A deer! Smile – you’re on infrared camera!

As I discovered, they like our blackberry bushes, and they like to eat apples from our apple tree too:

2014-08-28_04:35:27.748707

By our apple tree, eating all of our low hanging fruit!

There are actually quite a few images of the deer at night, it appears around 3 – 4 am and hangs around for about 20 minutes before moving on. At one point, there were a pair of them that would hang out in our backyard, plotting our murder. But luckily, we’re down to one.

Something More Sinister

While murderous deer scare me just as much as EMPs, a recent trip to the movies alerted me to a much bigger threat. It appears that deer-built weapons of mass destruction are far less deadly than raccoon built ones. Ominously, I have plenty of raccoons that frequent the back yard. They too seem to enjoy snacking on our blackberry bushes:

2014-08-11_07-08-32

Here’s a raccoon, rocketing right at our blackberry bushes!

As it turns out, I have some good pictures of raccoons in the backyard. This is good – I can test out my Machine Learning strategy with raccoons first to see if it is going to work before turning it on to the deer.

Neural Networks

I decided to use a neural network for the brains of my Machine Learning program. One of the reasons for doing so is due to the type of task I’m attempting.

See, one of the problems of Machine Learning is to define features that describe the data you are attempting to process. The Machine Learning Squirrel detector (and sentry water gun) project used color histograms, the size of a blob of color, and the texture of the color as features that the Machine Learning component used to discriminate squirrels from other objects in the yard.

While I could go that route and attempt to engineer features that define raccoons, I think a simpler start is to use a neural network to learn its own set of features from the raw pixel data – at least until I know more about what features are useful for the task at hand. That’s the beauty of a neural network – the hidden layers in the network learn features on their own – yay!

Sizing Up the Task

First, I needed to determine what data I actually wanted to send to the neural network. Sending the entire image is probably not going to work too well. For a start, my smallest resolution images are 640×480. That’s 307,200 pixels the neural network would have to look at! Secondly, the network won’t know what to focus on.

A safer start is to focus on portions of the image that could fit raccoons in them. The idea behind this is that the raccoon would take up the majority of the space in the image, giving the neural network something unique to focus on. After looking at a lot of raccoon images, a fixed size of 60×60 pixels looked to be about right.

My eventual goal will be to take a picture, and scan across and down it using a 60×60 window. I’ll then feed the contents of each window frame to the neural network. This way, the neural network will eventually see the entire image – one window frame at a time. Here’s an example of how I intend to scan across the screen:

scan_animation

Here’s how the program is going to scan across the picture.

The detection process will be a simple winner-take-all approach – if the neural network detects a raccoon in any one of the windows scanned across the picture, then it will return a positive result. I will even make the program draw a box around the suspected target, and label it with what it thinks it is.

However, that’s a problem for future self. First, I needed to define an actual neural network, and see if any of this was even possible.

Neural Network Architecture

Alright, now for some fun stuff. The neural network architecture I chose is one that has 3 layers. The first layer is the input layer, consisting of the pixels from the 60×60 window. This means it will have 3,600 nodes, where each node represents a single pixel. The second hidden layer will contain 10 nodes (these are the feature selectors). The third and final layer – the output layer – will have 1 node, which will have a value of 1 if there is a raccoon in the image, or 0 if not. Here’s what the architecture looks like:

network_diagram

Massaging Data

Next, I came across a tedious task – assembling the actual training data. I found 96 images of raccoons from my Kinect and Raspberry Pi NOIR surveillance camera, and used the GIMP to cut out 60×60 chunks that I fed to the neural network. Since 96 images really isn’t that many, I used Image Magick to generate mirror images of them all, giving me 192 raccoon images. Here is an example of some of the images:

raccoon_montage

So many raccoons eating my blackberries!

Just for fun, here’s a Bash one-liner that generates mirror images for all the files in a directory (insert evil-mirror-universe raccoon-with-a-goatee joke here):

for file in * ; do convert $file -flop "mirror-$file"; done

I also selected 170 random backgrounds of the backyard, and cut out 60×60 chunks that did not have any raccoons in them. These became negative cases for the neural network.

Colorless Green Ideas…

Next, I transformed the color images into black and white. Why? The black and white pictures only have a single “band” of color information (also known as the pixel intensity). One byte is used to represent how bright a pixel should be (a value between 0 and 255). Since there is only one value to worry about per pixel, using black and white images requires a lot less processing, and for a first shot, is a good idea to try.

Building the Neural Network

I won’t go into too many details here about how I built the actual neural network – that belongs in its own future post! I will say that I validated the approach of using a neural network first using tools such as Weka and R. Once I saw that the network was actually working somewhat decently, I decided to code my own version in Java for more control, and for better portability.

The Results

First, let me start off by saying that these results are really preliminary. Given that I only had ~350 samples in total, I decided to use a 10-fold cross-validation technique to tell me how well the algorithm was doing.

What this means is that I generated a training set using 80% of the data, and set aside 20% for testing. I built the model with the training data, and then tested it out using the test set. The 10-fold cross-validation means I did this 10 times, randomly choosing samples for the 80/20 split each time, and building a new model every time. This trick allows me to try out different cross sections of the data-set to get a better feel for how it performs as a whole. Here’s the results of my 10-fold cross-validation:

first_results

For those of you not versed in precision, recall, and F1 measures, the way to interpret this is that the neural network has an accuracy of roughly 72%.

That’s actually not too bad for a first attempt. Keep in mind however, that on real data, model performance is likely to be worse. But, all things considered, it’s still doing better than random guessing. Take that talking raccoons from outer-space!

Note: one thing to look at is the variance across the folds, represented by the error bars in the figure. In my example, the variance for precision, recall and F1 measure is quite low, meaning that the model is fairly stable. Large variance would mean that the model changes drastically across each fold. However, I’m likely over-fitting the model to the data, which is bad. I pretty much expected that, given how few training examples I have, and the slight skew to the positive class. More training data would help prevent this.

Also note: training 10 models sounds like it would take a lot of time. However, thanks to the JBlas library and knowledge of vectorization, it took all of 10 minutes to train 10 models – that’s ~1 minute per model.

Improving Accuracy

It helps to look at the examples the neural network gets wrong, so that I can start to diagnose what the problems are, and how to make it perform better. I’ve taken the fold with the best performance (a bit of a cheat) and looked at both the false positives and false negatives, since those are what the classifier gets wrong.

False positives are things the classifier thought were raccoons, but actually were not:

false-positives-bw

These images aren’t raccoons! Two of them are bushes, two are shadows on the ground, and one of them is a cat! Bah!

On the other hand, false negatives are things the classifier thought were not raccoons, but actually were:

false-negatives-bw

These are raccoons! Yes, raccoons everywhere! Get it right!

Color!

The one thing that might improve accuracy is somewhat obvious – color! Here are the images that the classifier got wrong, but with color information instead:

false-positives-color

Here are the false positives – note the lack of raccoons!

false-negatives-color

And here are the false negatives.

In a number of cases, it becomes much easier to distinguish between raccoon and bush based solely on color information. Raccoons are usually gray, not green… unless they are wearing some form of camouflaged pants!

Note: that color information is probably only somewhat useful moving forward. Shots that are taken during the night time will only have two colors associated with them. But at least for the daytime shots, color could prove useful.

Compression Artifacts

This is an example of a problem that I should have thought about before beginning data collection. Some image formats offer compression, resulting in files that are smaller on disk. The trade-off for a smaller file size is that the raw data is compressed when flushed to disk. This wouldn’t normally be a problem, except that some formats (like JPEG) are known as lossy formats, because you won’t get the same image back after you compress it – some of that original pixel data is thrown out.

How does this impact my raccoon detector? Here are two examples of my images:

compression-example

The picture on the left is taken from a Kinect in the early days of gathering data using my quick and dirty Java program. I blindly used javax.imageio.ImageIO to write my file to disk, and picked JPEG as the extension – for shame! That left me with no control over the compression level used. Notice how it is blotchy and blocky?

The picture on the right is taken using the NOIR camera using my Python script, which has a default compression quality setting of 85. Notice how it is sharper? /me facepalm

Moving forward, I should either:

  1. Turn off compression since disk space is actually quite cheap; or,
  2. Switch to a loss-less compression format, like Run Length Encoding used in PNG files

This is fairly significant, since sharper images may provide more information for the neural network to work with.

Parameter Tweaking

The first set of results is without performing a thorough search through the parameter space. There are many different settings on the neural network I can adjust – for example, number of iterations, learning rate, and hidden layer size.

One potentially good avenue to explore is the number of hidden units. By tweaking  how many feature selectors I implement in the hidden layer, I can produce an even more highly biased model. Then, by adding more examples, I can hopefully increase its performance.

Moving Forward

There are many things I intend to explore moving forward:

  1. I will turn off compression in image files. This should give me clearer examples moving forward.
  2. I want to experiment with the number of hidden layers in the neural network. I want to see if I can fine tune the results.
  3. I have gathered more pictures of deer, so I think it’s time to turn the camera on to them (insert evil laugh of evilness here).
  4. I need to experiment with night vision images. Obviously color information won’t be as useful under these circumstances, but it’s useful to compare how well the network does during the day versus during the night.

One interesting point to note is that detection of deer might work better than raccoons – this is due to the pokey bits of the deer (antlers, not horns!). Since they have a very distinct shape, the feature selectors on the neural network might work really well on them.

Check back soon when I’ll talk more about how I built the neural network, will share some wildlife photos taken in the backyard, and will turn my attention to detecting deer!

PS: for those interested in the neural network implementation, it is available on my GitHub account. You can currently train the neural network using image samples, and cross validate its performance. Very shortly, you will be able to save the  model and use it to make predictions!