Gradients

This page explores the concept of gradients, and shows how gradients are computed using TensorFlow.

Derivatives

The idea of Gradients in mathematics is based on the concept of derivatives. You may remember from calculus that the derivative of a function describes the rate of change of that function’s output with respect to its input. If you already feel comfortable with your understanding of derivatives, feel free to skip over this section.

This page does not describe the various techniques used to compute derivatives. There are innumerable resources available elsewhere that cover this topic; and, as you will see soon, TensorFlow is able to perform these computations automatically.

Instead, the goal of this page is to provide a conceptual understanding of derivates by describing a few different functions and there derivatives.

Example 1 - Linear Function

Consider the function \( f(x) = 2x + 500 \), and imagine that, in a hypothetical city called TensorTown, this function can be used to calculate the price of an apartment given its size in square feet.

The plot below illustrates this function with \( x \), or square feet, on the horizontal axis, and \( f(x) \), or price, on the vertical axis. Hover over the line to view the values of \( x \), \( f(x) \), and the derivative of \( f(x) \) (notated as \( f^{\prime}(x) \)) at different points along the graph.

The derivative of this function, \( f^{\prime}(x) \), is equal to \( 2 \) at every point along the line, indicating that the rate of change of price with respect to square feet is \( 2 \).

To illustrate what this means, consider an 800 square foot apartment. In TensorTown, the price of this apartment would be \( 2 * 800 + 500 = 2100 \).

Now what about an aparment that is 801 square feet? Here the price would be \( 2 * 801 + 500 = 2102 \). And the price of an 802 square foot apartment would be \( 2 * 802 + 500 = 2104 \) and so on.

Every time size of the apartment is increased by 1 square foot, the price of the apartment increases by $2. This is what is meant when we say that the rate of change of price with respect to square feet is \( 2 \).

Example 2 - Quadratic Function

In TensorTown, every employee is paid weekly. The amount paid to each employee is computed using the function \( f(x) = 0.25x^{2} + 500 \), where \( x \) represents the number of hours that the employee worked.

The graph below shows hours worked on the horizontal axis, and employee earnings on the vertical axis.

The derivative of this function, \( f^{\prime}(x) \), varies depending on the value of \( x \). Specifically the value of the derivative is one half the number of hours worked, or \( \frac{x}{2} \). This means that, as the number of hours worked increases, the rate of change of employee earnings with respect to hours worked increases as well.

Consider two different employees: Ted, who has worked 10 hours so far this week, and Todd, who has worked 40 hours. How much additional money will each employee earn by working one more hour?

Ted's Hours Ted's Earnings
10 $525.00
11 $530.25 ( $5.25 )
Todd's Hours Todd's Earnings
40 $900.00
41 $920.25 ( $20.25 )

If Ted works an 11th hour, he will make an additional $5.25; and if Todd works a 41st hour, he will make an additional $20.25. For each employee, the additional income they would receive by working one more hour is roughly equal to half the number of hours worked.

Gradients

Like derivatives, gradients describe the rate of change of a function with respect to its input. Gradients are most often used when we are working with a function that takes multiple inputs, unlike the two example functions shown earlier, which each took a single input.

Previously, the function \( f(x) = 0.25*x^{2} + 500 \) was used to compute an employee’s weekly earnings, with \(x \) representing the number of hours that the employee worked.

Now suppose that, in addition to their hourly wages, each employee also receives a weekly bonus from their boss. To determine the amount of bonus money an employee will receive, you can use the function \( f(y) = 50*ln(y + 1) \), where \( y \) represents the number of times that the employee complimented their boss during the week.

This means that to determine the total amount of money that an employee will earn during a week, you can use the following function:

$$f(x, y) = (0.25 * x^{2} + 500) + (50 * ln(y + 1))$$

The graph below shows this function with \( x \), or hours worked, on the horizontal axis, and \( y \), or number of compliments given, on the vertical axis. The shade of color represents the total amount of money earned, with lighter colors indicating less money, and darker colors indicating more money.

Touch the graph to view the gradient vector at different values of x and y

In this example, the gradient vector points in the direction that results in the most rapid increase in total earnings. Notice, however, that the direction of the gradient vector changes significantly depending on the current values of \( x \) and \( y \).

Derivatives & Gradients in TensorFlow

With TensorFlow, the gradients of any function can be computed automatically using the GradientTape class.

Single Variable Function

Earlier, the function \( f(x) = 0.25x^{2} + 500 \) was used to compute an employee’s weekly earnings based on the number of hours they worked.

TensorFlow’s GradientTape class is used to compute the derivative of this function at any value of \( x \):

x = tf.Variable(20.0, name="hours_worked")
with tf.GradientTape() as tape:
  f = lambda x: 0.25 * x**2 + 500
  result = f(x)

print("Watched variables:")
print(tape.watched_variables())

print("\nDerivative of f(x) at x = 20:")
print(tape.gradient(result, x))
Watched variables:
(<tf.Variable 'hours_worked:0' shape=() dtype=float32, numpy=20.0>,)

Derivative of f(x) at x = 20:
tf.Tensor(10.0, shape=(), dtype=float32)

The input to the function, x, is initialized as an instance of tf.Variable. This allows TensorFlow to compute derivatives and gradients with respect to this variable. The name attribute provided when creating the tf.Variable is purely optional, but can be useful for debugging.

The GradientTape object is responsible for watching every variable that is used within its context (i.e. within the scope of the with statement) for the purpose of computing gradients. The watched_variables() function of a GradientTape object can be used to view all of the variables that this tape is watching.

As shown earlier, the derivative of \( f(x) = 0.25x^{2} + 500 \) is equal to \( \frac{x}{2} \). With x set to 20, when the gradient of the function’s output, result, with respect to its input x is computed by calling tape.gradient(result, x), the result is 10.0 or one half the value of x.

Multi-Variable Function

The GradientTape can be used in the same way to compute the gradient of a function with multiple inputs, like the multi-variable function described in the example above:

$$f(x, y) = (0.25 * x^{2} + 500) + (50 * ln(y + 1))$$
x = tf.Variable(25.0, name='hours_worked')
y = tf.Variable(6.0, name='compliments_given')
with tf.GradientTape() as tape:
  f = lambda x, y: (0.25 * x**2 + 500) + (50 * tf.math.log(y+1))
  result = f(x, y)

print("Watched variables:")
print(tape.watched_variables())

print("\nDerivative of f(x, y) at x = 25 and y = 6:")
print(tape.gradient(result, (x, y)))
Watched variables:
(<tf.Variable 'hours_worked:0' shape=() dtype=float32, numpy=25.0>, <tf.Variable 'compliments_given:0' shape=() dtype=float32, numpy=6.0>)

Derivative of f(x, y) at x = 25 and y = 6:
(<tf.Tensor: shape=(), dtype=float32, numpy=12.5>, <tf.Tensor: shape=(), dtype=float32, numpy=7.1428576>)