Neural Networks I
This page explores the basics of neural networks, and shows how to start building neural networks from scratch using TensorFlow. To get the most out of this page, you should be familiar with Tensors in TensorFlow, and with the mathematical concept of Gradients.
Neural Network Anatomy
As the name suggests, a neural network is a network, or graph, composed of interconnected units called neurons (sometimes these neurons are simply called units). Each neuron in the network receives one or more inputs and produces an output, which may be the final output of the neural network, or the input to another neuron.
The simplest neural network imaginable is composed of a single neuron, which receives a single input, and produces a single output.
Generally, each connection to a neuron (that is, each “edge” in the graph) has an associated weight. The output of any neuron is the weighted sum of its inputs.
So, for the single neuron above, the output is computed using the following equation, where \( w \) is the weight of the connection between the input and the neuron:
When a neuron has multiple inputs, the product of each input and its corresponding weight is added to produce the output.
To describe neural networks, we often refer to the different layers of the network. When the output of one neuron is used as the input to another, we say that the two neurons are in separate layers. The diagram below shows a neural network with two layers:
Fully Connected Layers
When building neural networks with TensorFlow, we construct them layer by layer; and the most basic type of layer is the fully connected layer, or dense layer. In a fully-connected layer, each neuron is connected to each of the inputs to the layer.
The 2 layers in neural network shown earlier are each examples of fully-connected layers; likewise, the illustration below shows a fully-connected layer with 2 neurons and 2 inputs:
On the other hand, the graph shown below does NOT represent a fully-connected layer, because input1 is not connected to neuron2, and input2 is not connected to neuron1:
TensorFlow Dense Layer
In TensorFlow, the Dense
class represents a fully-connected layer. So far, we have seen three different properties of each fully-connected layer: (1) the number of neurons in the layer, (2) the number of inputs to each neuron, and (3) the weights of the connections between each input and each neuron. Let’s see how to specify these properties using the Dense
class in TensorFlow.
We specify the number of neurons and the number of inputs by using the units
and input_shape
parameters when creating the layer:
# create a fully-connected layer with 3 neurons and 2 inputs
layer = tf.keras.layers.Dense(units=3, input_shape=[2], use_bias=False)
The set_weights()
method can be used to explicitly set the weights of the connections between the inputs and the neurons. Because the layer we just created has 2 inputs and 3 neurons, we provide the weights as a 2x3 matrix (in other words, a tensor with shape (2, 3)).
In reality, you will rarely ever need to set the weights of a neural network explicitly using set_weights()
. By default, the weights are initialized with random values, and then “learned” during the training process.
layer.set_weights([
tf.constant([
[1.0, 2.0, 3.0],
[4.0, 5.0, 6.0]
])
])
When trying to map this matrix representation of the weights to the neural network diagrams from earlier, it is helpful to think of each row of the matrix as representing the weights of the connections to a particular input, and each column as representing the weights of the connections to a particular neuron.
To execute the layer, we invoke it as a function, passing in a tensor containing some inputs to the layer:
# create a tensor containing the inputs to the layer
inputs = tf.constant(
[[1.0, 2.0]]
)
# execute the layer
output = layer(inputs)
print(output)
tf.Tensor([[ 9. 12. 15.]], shape=(1, 3), dtype=float32)
The output of the layer is a tensor containing the outputs of each individual neuron. Because there are 3 neurons, we get 3 values in the output tensor.
The equation below shows how these three output values were computed given the inputs and weights:
One thing worth noting in the example code is that, although we are passing only two input values to the layer, the variable inputs
is actually a matrix, or 2-dimensional tensor with shape (1, 2)
. TensorFlow requires that the input to a Dense
layer have at least two dimensions.
This allows us to pass multiple “batches” of inputs input the layer in a single invocation. For example, we can execute the layer on a batch of 2 inputs by providing as input a tensor with shape (2, 2)
:
inputs = tf.constant([
[1.0, 2.0], # first item in the batch
[0.5, -1.0] # second item in the batch
])
# execute the layer
output = layer(inputs)
print(f"Output for the first item in the batch:")
print(output[0])
print(f"\nOutput for the second item in the batch:")
print(output[1])
Output for the first item in the batch:
tf.Tensor([ 9. 12. 15.], shape=(3,), dtype=float32)
Output for the second item in the batch:
tf.Tensor([-3.5 -4. -4.5], shape=(3,), dtype=float32)
We will see later on that this ability to process multiple inputs at once is critical when training neural networks.
Also, because both the inputs to the layer, as well as the weights of connections between the inputs and neurons, are represented as matrices, the entire computation of the layer’s output can be expressed as a single matrix multiplication operation between the inputs and the weights:
Bias
In the previous example, you might have noticed that we set the parameter use_bias
to False
when creating the Dense
layer. By default, the Dense
class adds a bias term to every neuron. A bias is simply a constant value that is added to the output of a neuron.
Another way to think of the bias term is as a weighted edge between the neuron and an implicit input that always has the value 1. The diagram below shows a neuron with two inputs and a bias term:
If we create an instance of the Dense
class without setting use_bias
to false, then a bias term will be added to the layer. We can confirm this by inspecting the weights of the layer:
layer = tf.keras.layers.Dense(units=1, input_shape=[2])
print(f"Weights: {layer.weights[0]}")
print(f"\nBias: {layer.weights[1]}")
Weights: <tf.Variable 'kernel:0' shape=(2, 1) dtype=float32, numpy=
array([[-0.1066072 ],
[-0.01804018]], dtype=float32)>
Bias: <tf.Variable 'bias:0' shape=(1,) dtype=float32, numpy=
array([0.], dtype=float32)>
The weights
property of the layer contains two tensors: the first includes the weights associated with the two inputs to the layer, and the second contains the bias term, which is initialized to 0.
Weight Initialization
When we create a new Dense
layer, the weights are selected randomly from a normal distribution, and the bias is initialized to 0. This behavior can be configured by passing a kernel_initializer
(for the weights) or bias_initializer
(for the bias) parameter to the Dense
constructor.
For example, the following code creates a Dense
layer with each weight initialized to 1, and the bias term initialized with a uniformly random value between 0.1 and 0.2:
layer = tf.keras.layers.Dense(
units=1,
kernel_initializer=tf.keras.initializers.Ones(),
bias_initializer=tf.keras.initializers.RandomUniform(minval=0.1, maxval=0.2)
)
layer.build(input_shape=[2])
print(f"Weights: {layer.weights[0]}")
print(f"\nBias: {layer.weights[1]}")
Weights: <tf.Variable 'kernel:0' shape=(2, 1) dtype=float32, numpy=
array([[1.],
[1.]], dtype=float32)>
Bias: <tf.Variable 'bias:0' shape=(1,) dtype=float32, numpy=
array([0.10065456], dtype=float32)>
Activation Functions
So far, we have seen that the output of a neuron is the weighted sum of its inputs plus a bias term, and that layers of neuron can be chained together in a neural network. This brings up an important question: what is the benefit of chaining together multiple layers of neurons?
For example, consider the following two-layer neural network: