Neural Networks I

This page explores the basics of neural networks, and shows how to start building neural networks from scratch using TensorFlow. To get the most out of this page, you should be familiar with Tensors in TensorFlow, and with the mathematical concept of Gradients.

Neural Network Anatomy

As the name suggests, a neural network is a network, or graph, composed of interconnected units called neurons (sometimes these neurons are simply called units). Each neuron in the network receives one or more inputs and produces an output, which may be the final output of the neural network, or the input to another neuron.

The simplest neural network imaginable is composed of a single neuron, which receives a single input, and produces a single output.

Generally, each connection to a neuron (that is, each “edge” in the graph) has an associated weight. The output of any neuron is the weighted sum of its inputs.

So, for the single neuron above, the output is computed using the following equation, where \( w \) is the weight of the connection between the input and the neuron:

$$output = input * w$$

When a neuron has multiple inputs, the product of each input and its corresponding weight is added to produce the output.

$$output = ( input_{1} * w_{1} ) + ( input_{2} * w_{2} )$$

To describe neural networks, we often refer to the different layers of the network. When the output of one neuron is used as the input to another, we say that the two neurons are in separate layers. The diagram below shows a neural network with two layers:


Fully Connected Layers

When building neural networks with TensorFlow, we construct them layer by layer; and the most basic type of layer is the fully connected layer, or dense layer. In a fully-connected layer, each neuron is connected to each of the inputs to the layer.

The 2 layers in neural network shown earlier are each examples of fully-connected layers; likewise, the illustration below shows a fully-connected layer with 2 neurons and 2 inputs:

input1neuron1output1fully-connectedlayerinput2output2neuron2

On the other hand, the graph shown below does NOT represent a fully-connected layer, because input1 is not connected to neuron2, and input2 is not connected to neuron1:

input1neuron1output1NOT afully-connectedlayerinput2output2neuron2

TensorFlow Dense Layer

In TensorFlow, the Dense class represents a fully-connected layer. So far, we have seen three different properties of each fully-connected layer: (1) the number of neurons in the layer, (2) the number of inputs to each neuron, and (3) the weights of the connections between each input and each neuron. Let’s see how to specify these properties using the Dense class in TensorFlow.

We specify the number of neurons and the number of inputs by using the units and input_shape parameters when creating the layer:

# create a fully-connected layer with 3 neurons and 2 inputs
layer = tf.keras.layers.Dense(units=3, input_shape=[2], use_bias=False)

The set_weights() method can be used to explicitly set the weights of the connections between the inputs and the neurons. Because the layer we just created has 2 inputs and 3 neurons, we provide the weights as a 2x3 matrix (in other words, a tensor with shape (2, 3)).

In reality, you will rarely ever need to set the weights of a neural network explicitly using set_weights(). By default, the weights are initialized with random values, and then “learned” during the training process.

layer.set_weights([
    tf.constant([
        [1.0, 2.0, 3.0],
        [4.0, 5.0, 6.0]
    ])
])

When trying to map this matrix representation of the weights to the neural network diagrams from earlier, it is helpful to think of each row of the matrix as representing the weights of the connections to a particular input, and each column as representing the weights of the connections to a particular neuron.

4.01.05.02.06.03.0input1neuron1neuron2neuron3input2

To execute the layer, we invoke it as a function, passing in a tensor containing some inputs to the layer:

# create a tensor containing the inputs to the layer
inputs = tf.constant(
    [[1.0, 2.0]]
)

# execute the layer
output = layer(inputs)
print(output)
tf.Tensor([[ 9. 12. 15.]], shape=(1, 3), dtype=float32)

The output of the layer is a tensor containing the outputs of each individual neuron. Because there are 3 neurons, we get 3 values in the output tensor.

The equation below shows how these three output values were computed given the inputs and weights:

$$\begin{aligned} output &= \begin{bmatrix} ( i_{0} w_{0,0} ) + ( i_{1} w_{1,0} ) \\ ( i_{0} w_{0,1} ) + ( i_{1} w_{1,1} ) \\ ( i_{0} w_{0,2} ) + ( i_{1} w_{1,2} ) \end{bmatrix} \\ &= \begin{bmatrix} (1.0 \times 1.0) + (2.0 \times 4.0) \\ (1.0 \times 2.0) + (2.0 \times 5.0) \\ (1.0 \times 3.0) + (2.0 \times 6.0) \end{bmatrix} \\ &= \begin{bmatrix} 9.0 \\ 12.0 \\ 15.0 \end{bmatrix} \end{aligned}$$

One thing worth noting in the example code is that, although we are passing only two input values to the layer, the variable inputs is actually a matrix, or 2-dimensional tensor with shape (1, 2). TensorFlow requires that the input to a Dense layer have at least two dimensions.

This allows us to pass multiple “batches” of inputs input the layer in a single invocation. For example, we can execute the layer on a batch of 2 inputs by providing as input a tensor with shape (2, 2):

inputs = tf.constant([
    [1.0, 2.0], # first item in the batch
    [0.5, -1.0] # second item in the batch
])

# execute the layer
output = layer(inputs)
print(f"Output for the first item in the batch:")
print(output[0])

print(f"\nOutput for the second item in the batch:")
print(output[1])
Output for the first item in the batch:
tf.Tensor([ 9. 12. 15.], shape=(3,), dtype=float32)

Output for the second item in the batch:
tf.Tensor([-3.5 -4.  -4.5], shape=(3,), dtype=float32)

We will see later on that this ability to process multiple inputs at once is critical when training neural networks.

Also, because both the inputs to the layer, as well as the weights of connections between the inputs and neurons, are represented as matrices, the entire computation of the layer’s output can be expressed as a single matrix multiplication operation between the inputs and the weights:

$$\begin{aligned} outputs &= Inputs \times Weights \\ &= \begin{bmatrix} 1.0 & 2.0 \\ 0.5 & -1.0 \end{bmatrix} \times \begin{bmatrix} 1.0 & 2.0 & 3.0 \\ 4.0 & 5.0 & 6.0 \end{bmatrix} \\ &= \begin{bmatrix} 9.0 & 12.0 & 15.0 \\ -3.5 & -4.0 & -4.5 \end{bmatrix} \end{aligned}$$

Bias

In the previous example, you might have noticed that we set the parameter use_bias to False when creating the Dense layer. By default, the Dense class adds a bias term to every neuron. A bias is simply a constant value that is added to the output of a neuron.

Another way to think of the bias term is as a weighted edge between the neuron and an implicit input that always has the value 1. The diagram below shows a neuron with two inputs and a bias term:

If we create an instance of the Dense class without setting use_bias to false, then a bias term will be added to the layer. We can confirm this by inspecting the weights of the layer:

layer = tf.keras.layers.Dense(units=1, input_shape=[2])
print(f"Weights: {layer.weights[0]}")
print(f"\nBias: {layer.weights[1]}")
Weights: <tf.Variable 'kernel:0' shape=(2, 1) dtype=float32, numpy=
array([[-0.1066072 ],
       [-0.01804018]], dtype=float32)>

Bias: <tf.Variable 'bias:0' shape=(1,) dtype=float32, numpy=
array([0.], dtype=float32)>

The weights property of the layer contains two tensors: the first includes the weights associated with the two inputs to the layer, and the second contains the bias term, which is initialized to 0.

Weight Initialization

When we create a new Dense layer, the weights are selected randomly from a normal distribution, and the bias is initialized to 0. This behavior can be configured by passing a kernel_initializer (for the weights) or bias_initializer (for the bias) parameter to the Dense constructor.

For example, the following code creates a Dense layer with each weight initialized to 1, and the bias term initialized with a uniformly random value between 0.1 and 0.2:

layer = tf.keras.layers.Dense(
    units=1,
    kernel_initializer=tf.keras.initializers.Ones(),
    bias_initializer=tf.keras.initializers.RandomUniform(minval=0.1, maxval=0.2)
)
layer.build(input_shape=[2])

print(f"Weights: {layer.weights[0]}")
print(f"\nBias: {layer.weights[1]}")
Weights: <tf.Variable 'kernel:0' shape=(2, 1) dtype=float32, numpy=
array([[1.],
       [1.]], dtype=float32)>

Bias: <tf.Variable 'bias:0' shape=(1,) dtype=float32, numpy=
array([0.10065456], dtype=float32)>

Activation Functions

So far, we have seen that the output of a neuron is the weighted sum of its inputs plus a bias term, and that layers of neuron can be chained together in a neural network. This brings up an important question: what is the benefit of chaining together multiple layers of neurons?

For example, consider the following two-layer neural network:

x1.0w = 2.0w = 3.0bias = 1.01.0bias = 2.0output

If we represent each layer as a mathematical function, we get:

$$\text{Layer 1: } f_{1}(x) = 2x + 1$$
$$\text{Layer 2: } f_{2}(x) = 3x + 2$$

To represent the entire neural network, then, we can use the following function:

$$\begin{aligned} f(x) &= f_{2}(f_{1}(x)) \\ &= 3(2x + 1) + 2 \\ &= 6x + 5 \end{aligned}$$

In other words, this two-layer neural network is exactly equivalent to the following single-layer network:

w = 6.01.0bias = 5.0outputx

Because each neuron is performing a linear transformation of its inputs, chaining neurons together in multiple layers ultimately results in just another linear transformation. For this reason, the neurons in most neural networks include an additional non-linear function called an activation function, which is applied after computing the weighted sum of the inputs.

ReLU

One of the most common activation functions is the rectified linear unit (ReLU) function, which is defined as:

$$ReLU(x) = \begin{cases} 0 & \text{if } x < 0 \\ x & \text{if } x \geq 0 \end{cases}$$

To use the ReLU activation function in TensorFlow, we pass in the activation parameter when creating a Dense layer:

# Create a dense layer with a ReLU activation function.
# The weight of the connection between the layer's single
# input and its single neuron is initialzied to 1.0
# for demonstration purposes. 
layer = tf.keras.layers.Dense(
    units=1,
    input_shape=[1],
    activation='relu',
    kernel_initializer='ones',
    use_bias=False
)

input_1 = tf.constant([[1.0]])
print(f"Layer output with input = 1.0: {layer(input_1)}")

input_negative_1 = tf.constant([[-1.0]])
print(f"\nLayer output with input = -1.0: {layer(input_negative_1)}")
Layer output with input = 1.0: [[1.]]

Layer output with input = -1.0: [[0.]]

Let’s revisit the two-layer network from earlier, now with a ReLU activation function applied to the output of the first layer:

x1.0w = 2.0w = 3.0bias = 1.01.0bias = 2.0outputReLU function

Mathematically, the network now looks like this:

$$\begin{aligned} \text{Layer 1: } f_{1}(x) &= \begin{cases} 0 & \text{if } 2x + 1 < 0 \\ 2x + 1 & \text{if } 2x + 1 \geq 0 \end{cases} \\ \text{Layer 2: } f_{2}(x) &= 3x + 2 \end{aligned}$$
$$\begin{aligned} \text{Network: } f(x) &= f_{2}(f_{1}(x)) \\ &= 3f_{1}(x) + 2 \\ &= \begin{cases} 2 & \text{if } x < -\frac{1}{2} \\ 6x + 5 & \text{if } x \geq -\frac{1}{2} \end{cases} \end{aligned}$$

We can plot this function to see how it compares visually to the function that described the original network. The plots below show the input to the network on the horizontal axis, and the output of the network on the vertical axis:

Neural Network with no Activation
Neural Network with ReLU Activation

As we can see, the ReLU activation function introduces non-linearity into the network, which allows the network to model more complex relationships between its inputs and its outputs.

And as more layers are added to the network, the non-linearities compound, allowing the network to model even more complex functions. For example, the next plot shows the inputs and outputs of a neural network with 4 layers, where each layer (except for the last layer) uses the ReLU activation function:

4-Layer Neural Network with ReLU Activation

The Model Class

So far, we have seen how to make neural network layers by creating instances of the Dense class. When building more complex, multi-layer networks, however, it is generally easier to implement a Model class that represents the entire neural network. Representing your neural network with a Model class also makes it much easier to train, evaluate, and save the neural network.

A Model class is a Python class that inherits from the tf.keras.Model class. Within the __init__() method of the class, we define the neural network layers as instance attributes:

class MyModel(tf.keras.Model):
    def __init__(self):
        super(MyModel, self).__init__()
        self.layer_1 = tf.keras.layers.Dense(units=2, activation="relu")
        self.layer_2 = tf.keras.layers.Dense(units=2,activation="relu")
        self.layer_3 = tf.keras.layers.Dense(units=1)

We can define how the layers of the network are executed on input data by implementing the call() method:

class MyModel(tf.keras.Model):
    def __init__(self):
        super(MyModel, self).__init__()
        self.layer_1 = tf.keras.layers.Dense(units=2, activation="relu")
        self.layer_2 = tf.keras.layers.Dense(units=2,activation="relu")
        self.layer_3 = tf.keras.layers.Dense(units=1)
    
    def call(self, inputs):
        layer_1_outputs = self.layer_1(inputs)
        layer_2_outputs = self.layer_2(layer_1_outputs)
        layer_3_outputs = self.layer_3(layer_2_outputs)
        return layer_3_outputs

To execute the neural network, we create an instance of the Model class and invoke it as a function:

model = MyModel()

inputs = tf.constant(
    [[1.0, 2.0]]
)
outputs = model(inputs)

print("Outputs:")
print(outputs)
Outputs:
tf.Tensor([[0.298588]], shape=(1, 1), dtype=float32)

Next Steps

This page described the basic building blocks of neural networks, including neurons, layers, and activation functions. It also demonstrated how to use the Dense class to create neural network layers, and how to use the Model class to represent an entire neural network. To learn about neural network training, and to see how to train a neural network with TensorFlow, see Neural Networks II - Training