Training a neural net to add numbers

Machine-learning is a vast topic.  It is all the maths.  There must be thousands of tools and libraries out there spanning every language.  Every day some new ML, deep learning, or AI initiative makes an announcement.  In the last few months, it’s been a nonstop fire hose of geekery for me.

Since I know enough python to cause damage, I was excited to have stumbled across the Apollo project.  Apollo is built on top of torch and Caffe.  It supports all the layers that Caffe supports.  In this post, I’ll walk you through a bit of code that trains a convolutional neural net to do addition.

A neural net in Apollo/Caffe

A neural net in Caffe is broken up into layers.  Layers of neurons, supposedly.  However, in the case of Caffe they feel more like functions that take data inputs (referred to as “bottoms”), have tunable parameters, and then have outputs (referred to as “tops”).  Different types of layers are encapsulated in classes.   You compose a a neural net by combining some number of instantiated layers, attaching the tops of some layers to the bottoms of others.  In this way, you are forming a directed graph, where layers are nodes and edges are the connections between the tops and bottoms.

When you want to train a neural net to do some specific task, you’ll need to understand your inputs at the very bottom of the net, and the expected output at the top.  With this in mind, you’ll need to choose the layers (and their corresponding tunable parameters) that get you from input to expected output.  Keep in mind, this is an iterative process.  Fail, tune, fail, rearrange the layers, fail some more, etc… until you get the desired results or you realize that what you’re trying to do just isn’t going to work.

The code will follow a well understood process:  Initialize the neural net components, string them together, iteratively train the net, and then test it.  Typically, the dataset you use to train the net is called the “training set.”  The dataset you use to test the net should consist of samples not seen in the training set.  In our case, the samples we use during training and testing are pairs of numbers.  During training, we also supply the answer.  In machine-learning parlance, the answer is called “the label” or “the target.”  In our example, we don’t really “test” the net.  Instead, you will watch it guess with increasing accuracy over the duration of the training process.

So how does training work?  In the case of the convolutional net, we have what’s called a “loss” function.  This is a measure of how often the neural net is wrong when it attempts to solve problems.  Inside the neural net there are hidden variables (not directly tunable by us) which are adjusted in slight increments up or down according to the loss value.  During the iterative training loop, these hidden variables eventually converge on an optimum set of values providing the least amount of loss.  At that point, the neural net has learned it’s task as best as it can.  This kind of training process is referred to as “back-propagation.”  For a great explanation of this, I recommend checking out this other tutorial.  That tutorial is the minimum-amount-of-math explanation of back propagation, but it would help if you could remember what a derivative is.

Our code and the diagram


In the diagram, we have a total of 6 “layers.”  The “left,” “right,” and “label” layers are just input layers.  There is no magic happening in the code here.  The concatenation layer takes the left and right values and merges them into a single data structure. The convolution layer is where the magic happens.  Here is the network of neurons learning how to do addition.  The loss layer compares the convolution layer’s guess to the actual answer (the label) during training.  The red arrow is the feedback provided to the convolution layer to adjust it’s hidden variables based on the computed error rate.

…and here is the code.


Wandering in Circles

You know what has been bothering the fuck out of me lately?  We use a lot of terms in the tech industry so loosely that they have virtually no meaning.  It’s no wonder people outside of tech some times think we are full of shit.  We are full of shit.  I know this to be true, because I’ve attended those bullshit meetings where people parrot these terms like… well… parrots.   The meetings are bullshit precisely because nobody is really on the same page about what anything means.

Workflow, Automation, Integration.  What do these words even mean?  From the operator’s perspective, these are all closely related and in a lot of cases they are identical.  Operators want to automate their workflows and there is an immutable, absolute requirement for integration to accomplish this.  Read that sentence over and over.  Some might argue that much of the excitement about SDN on the customer side was predicated on the idea that vendors would finally realize this.  But why hasn’t this really happened yet?

1.  The general openness of networking hasn’t improved much since SDNs inception.  Sure, Broadcom has “open sourced” their SDK but quotes are required around that term.  Intel has actually open sourced DPDK as far as I can tell.  Those are good starts.  Other projects (which shall not be named) are open source, but they are intentionally crippled versions of a commercial product.  All in all, this is stifling innovation because it leaves the hard work up to people who are not particularly motivated to do something different with respect to networking.

2. Vendors don’t understand what customers actually do.  I’ll take shit for this statement, but I’m absolutely convinced of it at this point.  Customers don’t actually understand what they do, either.  They’ve learned how to do it, but that’s not the same thing.  Reciting RFCs, design principles, and troubleshooting patterns is missing the point.  What we actually do, behaviorally (with respect to our knowledge domain and the systems within), is another thing altogether.  It should be the subject of a study in some social sciences field, I imagine. Engineering, in any field really, is all about how humans navigate complex problem spaces cognitively.  This is very hard to understand in fields with enormous technical width and depth.  It’s easy to jump from the human side of it to weird rationalizations based on specific elements and constructs of an engineer’s chosen specialty.

*Side note: I will eat the next person who tells me they understand what the customer wants because the sales team told them so.  I will also eat the next person who tells me they did a survey, so they “have all the answers already.”

3.  Nobody wants to be responsible for taking the lead on solving this problem.  Imagine a circle.  Along the entire circumference there are dots.  Those dots represent APIs to the various systems you deploy or use in your environment.  Who is responsible integrating between any two dots?  Company “A,” company “B,” or the user?  What if the integration is three or four systems?  What if something changes for one of those companies?  On the vendor side, everyone wants everyone else to integrate with them.  “They’ll download our SDKs.  We’ll create a developer community.  PROBLEM SOLVED.”  Because there isn’t a graveyard littered with initiatives predicated on such nonsense.  Workflows and automation are highly fluid things.  This is a hard problem to solve, and it really kind of falls outside of the knowledge domain of traditional network companies.  So there is resistance to putting the time and resources into solving it.