Machine-learning is a vast topic. It is all the maths. There must be thousands of tools and libraries out there spanning every language. Every day some new ML, deep learning, or AI initiative makes an announcement. In the last few months, it’s been a nonstop fire hose of geekery for me.
Since I know enough python to cause damage, I was excited to have stumbled across the Apollo project. Apollo is built on top of torch and Caffe. It supports all the layers that Caffe supports. In this post, I’ll walk you through a bit of code that trains a convolutional neural net to do addition.
A neural net in Apollo/Caffe
A neural net in Caffe is broken up into layers. Layers of neurons, supposedly. However, in the case of Caffe they feel more like functions that take data inputs (referred to as “bottoms”), have tunable parameters, and then have outputs (referred to as “tops”). Different types of layers are encapsulated in classes. You compose a a neural net by combining some number of instantiated layers, attaching the tops of some layers to the bottoms of others. In this way, you are forming a directed graph, where layers are nodes and edges are the connections between the tops and bottoms.
When you want to train a neural net to do some specific task, you’ll need to understand your inputs at the very bottom of the net, and the expected output at the top. With this in mind, you’ll need to choose the layers (and their corresponding tunable parameters) that get you from input to expected output. Keep in mind, this is an iterative process. Fail, tune, fail, rearrange the layers, fail some more, etc… until you get the desired results or you realize that what you’re trying to do just isn’t going to work.
The code will follow a well understood process: Initialize the neural net components, string them together, iteratively train the net, and then test it. Typically, the dataset you use to train the net is called the “training set.” The dataset you use to test the net should consist of samples not seen in the training set. In our case, the samples we use during training and testing are pairs of numbers. During training, we also supply the answer. In machine-learning parlance, the answer is called “the label” or “the target.” In our example, we don’t really “test” the net. Instead, you will watch it guess with increasing accuracy over the duration of the training process.
So how does training work? In the case of the convolutional net, we have what’s called a “loss” function. This is a measure of how often the neural net is wrong when it attempts to solve problems. Inside the neural net there are hidden variables (not directly tunable by us) which are adjusted in slight increments up or down according to the loss value. During the iterative training loop, these hidden variables eventually converge on an optimum set of values providing the least amount of loss. At that point, the neural net has learned it’s task as best as it can. This kind of training process is referred to as “back-propagation.” For a great explanation of this, I recommend checking out this other tutorial. That tutorial is the minimum-amount-of-math explanation of back propagation, but it would help if you could remember what a derivative is.
Our code and the diagram
In the diagram, we have a total of 6 “layers.” The “left,” “right,” and “label” layers are just input layers. There is no magic happening in the code here. The concatenation layer takes the left and right values and merges them into a single data structure. The convolution layer is where the magic happens. Here is the network of neurons learning how to do addition. The loss layer compares the convolution layer’s guess to the actual answer (the label) during training. The red arrow is the feedback provided to the convolution layer to adjust it’s hidden variables based on the computed error rate.
…and here is the code.