Introduction to Machine Learning

The purpose of this website is to explain Machine Learning concepts, in a simpler way, to people who are new to the subject. The heart of the website consists of interactive diagrams that respond to your actions, and are intended to illustrate the ideas presented in the text. These diagrams have a light grey background. I call these diagrams Interactions. They are intended to be intuitive to use. For most of the Interactions, you just have to hover over the circles or connecting lines (neurons or weights), with your mouse, or press the appropriate buttons. The mathematics in this website is kept as simple as possible. If you still find it difficult, you can skip over the difficult sections and move on to other sections of the website. This website is much better experienced using a laptop than a smartphone.

Machine Learning, as we currently apply it, consists of mathematical and computational structures that learn how to recognize a set of example patterns and also how to generalize from them to similar examples that have not been seen before. These structures are commonly and popularly known as "neural networks". They are called neural networks because they are loosely modeled on how biological neurons are networked together in the human brain. The thing to emphasize is that the modeling is very loose and informal. For example, the inspiration for airplanes came from birds, but airplanes don't flap their wings to fly. Correspondingly, while airplanes have some properties (speed and carrying capacity) that are superior to those of birds, they have others (flexibility, maneuverability, take-off and landing space) that aren't nearly as good. Machine Learning is a subset of a broader branch of computer science called Artificial Intelligence. These days, it is sometimes used to denote the same thing as Artificial Intelligence. To explain the learning and generalization of neural networks, we need to introduce some concepts, as given below.

Everything in a neural network is either a value or an element that transforms a value in some way. A value is just a
decimal number. Values flow *forward* in a neural network from back to front or *backwards* from front to back.
When we draw a neural network on a page, the back side of the neural network is either on the left hand side or the bottom
side, and the front side is correspondingly the right hand side or the top side.

The name given to values as they enter the neural network from the outside world to flow forwards through the network is called "Inputs". The name given to values as they exit from the front side is called "Outputs".

We introduce a notion of weights and neurons. Weights are just decimal numbers which are represented as lines.
A given weight connects two neurons together. There is an incoming neuron from which a value flows into the weight, and
an outgoing neuron into which the value flows out of the weight. The value flowing into the weight is multiplied by
the weight value to give the value flowing out of the weight. Multiple weights generally connect into a given neuron
(this neuron is the outgoing neuron, and the weights connect into it from the incoming neurons). A neuron is usually
represented as a circle, and performs a somewhat complicated transformation on the values flowing into it. First, it
adds the values flowing into it from the weights connecting into it. Let’s call this addition the “sum”. Then it applies
a "Function" to the sum. This function is called the “activation function”.
There is a large number of possible activation functions. The function we will consider for now is called the
“sigmoid” activation function, and it is written as 1/(1 + e^{-sum}). This is the value that flows out of the
neuron. The letter "e" is explained in our section on Irrational Numbers.
There is a “forward pass” and a “backward pass” through the neural network. What we have described
here is in the context of the forward pass. The backward pass is described in a section below.
The word “neuron” is inspired by the fact that the function applied by the neuron, including adding the values flowing into it
from the weights, is somewhat similar to the action of a real biological neuron. Hence, as discussed earlier, the term
"neural networks". In the context of the artificial neural networks that we are talking about here, the neurons are also
called “nodes”.

The inputs, outputs, neurons, and weights are organized into layers. There is a layer of inputs feeding into the next layer of neurons via the first layer of weights. This layer of neurons feeds into the next layer of neurons through the next layer of weights. And so on until you reach the output layer. Layers of neurons that are present between the input layer and the output layer are called hidden layers, because they are not visible to the outside world. We also have to mention that the layer of inputs is a layer of raw values that flow into the first hidden layer via the connecting weights. The layer of outputs, on the other hand, is a layer of neurons, each of which transforms the values flowing into it to yield a value flowing out of it, which is an output value. All of this is illustrated in the Interaction below.

The concepts on this site are explained via interactive diagrams like the one below. I call these interactive diagrams "Interactions". To use an Interaction, you just have to follow your intuition. For example, to use the Interaction below, just hover on the weights or neurons with your mouse or click the "Re-randomize" button. The circles are neurons and the connecting lines are weights. The number inside each neuron is the value flowing out of that neuron. The number on each weight is the value of that weight. The "Re-randomize" button randomizes the weights and the input layer values. All the neuron values are computed from these values. In practice, the input layer values are never random; they belong to a set of fixed input-output pairs which serve as examples for training. We randomize them in this Interaction purely to show how the outputs are computed from the inputs.

We are finally in a position to understand the concept of machine learning introduced at the beginning of this page. Multiple examples consisting of sets of inputs and the corresponding desired outputs are presented to a neural network. The inputs feed forward through the neural network getting multiplied by the weights and transformed by the neurons to yield the "actual outputs", in contrast to the "desired outputs" presented as part of the example. The actual outputs are compared to the desired outputs using a procedure from calculus. In doing so, updates are first computed for the right most layer (the one closest to the outputs) of weights. The updates are just decimal numbers which get added (or subtracted) to the weights. As we mentioned earlier, the weights are also just decimal numbers. Updates are then computed using calculus to the next layer of weights to the left and these updates are then applied as before to this set of weights. This is done until all the layers of weights are covered. The inputs are then run through these modified weights and the resulting outputs are once again compared with the desired outputs. This process of updating weights is continued until the actual outputs are "close enough" to the desired outputs. The absolute difference between the actual and desired outputs which constitutes "close enough" can be specified by the designer of the neural network. Machine learning, then, is a process of updating the weights of a neural network until the actual outputs for a given set of inputs is close enough to the desired outputs. Note: there are other types of machine learning that are different from the one presented on this page, which is called "Supervised Learning". But this one is by far the most common and useful, so this is all you need to worry about for now.

The power of machine learning comes from the fact that there can be thousands (or millions) of examples that constitute an extremely detailed pattern to be learned by the neural network. The neural network thus generalizes from the available examples of the pattern to the underlying mapping between the inputs and outputs. In this way, neural networks are now able to effectively transcribe speech to text, recognize faces and other images, diagnose diseases from raw patient data, and perform a host of other tasks. The processing done by sophisticated neural networks takes a lot of computational power. Computational power has been approximately doubling every two years for the last several decades, and the power needed for these sophisticated neural networks has just recently become available. This, along with recent theoretical advances in neural networks has turned the present period into a golden age of machine learning.

This website is dedicated to Mark Lawrence, who taught me how to program.

©Copyright 2022 by Sandeep Jain. All rights reserved.