Skip to content

Gradient Descent


The gradient of a function gives the direction of steepest increase, or ascent.



A vector for downhill descent and steepness of it.

Gradient descent is a first-order iterative optimization algorithm for finding the local minimum of a differentiable function.

We find the minima most often. In order to find the maxima, multiply the objective function by -1 and switch maximize to minimize.

Fix Me!

Example in Python

Objective function is: \(f(x) = x^4 - 3x^3 + 2\), derivative is \(f'(x) = 4x^3 - 9x^2\). We want to find the root of the gradient function, \(f'(x) = 4x^3 - 9x^2\).

  • Solve for \(4x^3 - 9x^2 = 0\) -- \(x = \frac{9}{4}\)
  • Evaluate the second derivative;
  • function has plateau point of 0, with global minimum equal to \(x=\frac{9}{4}\).

Gradient Descent Algorithm Steps

  • \(minimize(f(x))\)
  • Calculate the derivative, \(f'(x)\) with respect to \(x\), the decision variable.
  • Set initial root estmate of \(f'(x), x_0\) to a random number.
  • \(x_i = x_0\)
  • Repeat
    • \(x_i = x_i - \alpha \cdot f'(x)\)
  • Until convergence. E.g.., \(|df(x_i)==0| < tolerance\)

GD Code Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
next_x = 6 # Start the search at x=6
gamma = 0.01 # Step-size multiplier
precision = 0.00001 # Desired precision of result
max_iters = 10000 # Max number of iterations

# Derivative function
def df(x):
    return 4*x**3 - 9*x**2

for _i in range(max_iters):
    current_x = next_x
    next_x = current_x - gamma*df(current_x)

    step = next_x - current_x
    if abs(step) <= precision:
        break

print("Minimum at ", next_x)

# The output for the above will be something like
# "Minimum at 2.2499646074278457"