Understanding Lagrange Multipliers

Lagrange multipliers are a method for locally minimizing or maximizing a function, subject to one or more constraints. For example, find the values of x and y that make x^2 + y^2 as small as possible, while satisfying the constraint xy = 1. What follows is an explanation of how to use Lagrange multipliers and why they work.  The explanation (especially of why they work) is much simpler with only one constraint, so we’ll start with an intuitive argument in that case and then move on to a slightly more rigorous argument in the case with multiple constraints.  Both cases require some familiarity with partial derivatives and vectors.

One Constraint

Say we want to minimize a function f(x, y), subject to the constraint g(x, y) = 0.  (For now we’ll minimize a function of two variables, but the method extends naturally to functions of more variables, and the procedure for maximization is exactly the same.)  For the example above, f(x, y) = x^2 + y^2 and g(x, y) = xy - 1 (becasue xy - 1 = 0 is equivalent to xy = 1).  To solve the problem, we create a new function, L(x, y, \lambda), called the Lagrangian.  It is defined as L(x, y, \lambda) = f(x, y) - \lambda g(x, y), where \lambda is a new variable called the Lagrange multiplier.  (Some people use a + instead of a -, but it doesn’t change anything except for the sign of \lambda.) Then, we take the partial derivatives of L with respect to the original variables, x and y, and also \lambda.  We set all those partials equal to zero, and solve for x and y. For the example given above, we have:

\begin{array}{rcl} L(x, y, \lambda) & = & x^2 + y^2 - \lambda (xy - 1) \\ \partial L / \partial x & = & 2x - \lambda y \\ \partial L / \partial y & = & 2y - \lambda x \\ \partial L / \partial \lambda & = & -(xy - 1) \end{array}

Setting all the partials equal to zero and solving, we get that \lambda = 2 and (x, y) is either (1, 1) or (-1, -1).  Indeed, these points are local minima of f(x, y), subject to the constraint that g(x, y) = 0.  (In this case they are also the global minima, although in general that is not guaranteed.)

Why do Lagrange multipliers work?  It is tempting to think that we are finding a constrained minimum of f by finding an unconstrained minimum of L; after all, we are setting all the partials of L equal to zero.  But this is wrong.  The solution (x, y, \lambda) turns out to actually never be a local minimum or maximum for L; it’s always a saddle point for L.  So why do Lagrange multipliers work?  The answer has to do with level curves and gradients. Lagrange

A level curve of a function is the set of points for which the function equals a constant value.  (In three dimensions they’re called level surfaces, and more generally, level sets.)  In the picture above, the red circles are some of the level curves of f(x, y) = x^2 + y^2 (larger circles correspond to higher values of f).  The blue curve is the set of points that satisfy the constraint g(x, y) = 0, making it a level curve of g(x, y).

Constrained minimization seeks the point on the blue curve that is the lowest in the red valley: the valley where each red curve represents a different height. It turns out that this point will always be one where the red and blue level curves are tangent to each other, such as the black point in the picture above.  To see why, it helps to look at a point where the curves are not tangent, such as the orange point.  Since the red and blue level curves aren’t tangent to each other there, they cross each other.  Since they cross, moving one direction along the blue curve moves up in the red valley, and moving the other direction moves down.  In other words, there is a local change to (x, y) that preserves the constraint g(x, y) = 0 and causes the objective function f(x, y) to decrease, and the opposite change causes it to increase.  There’s no way it can be a constrained minimum or maximum. Since the minimum can’t be at a point where the level curves cross, it has to be at a point where they are tangent.  There, as you move along the curve of g(x, y) = 0, f(x, y) holds constant to a first order approximation, so it could be a minimum or a maximum.  (It still might not be either, but at least it could be.)

How do we express mathematically the statement that the level curves of f and g are tangent?  This is where the gradients come in.  The gradient of a function f is written as \nabla f and defined as a vector where each component is the partial derivative of f with respect to the corresponding variable: \nabla f = (\partial f / \partial x, \partial f / \partial y).  The gradient has a property that will be very useful to us: the gradient of f is always perpendicular to the level curve of f at a given (x, y). To see why this is true, say that (\Delta x, \Delta y) is a small amount by which we wish to change (x, y).  How will it change f(x, y)?  To first order, the answer is that:

f(x + \Delta x, y + \Delta y) \approx f(x, y) + \frac{\partial f}{\partial x} \Delta x + \frac{\partial f}{\partial y} \Delta y

If (\Delta x, \Delta y) is in the direction of the level curve, we will have f(x + \Delta x, y + \Delta y) \approx f(x, y), which means that \frac{\partial f}{\partial x}\Delta x + \frac{\partial f}{\partial y} \Delta y = 0 .  In other words, the dot product of \nabla f and (\Delta x, \Delta y) is zero.  Since (\Delta x, \Delta y) is in the direction of the level curve, this means that the level curve and the gradient are perpendicular to each other.

Since we want the level curves of f and g to be tangent to each other, and each function’s gradient is perpendicular to its level curve, we want the gradients to point in the same direction (or in opposite directions).  Two vectors point in the same direction if one is a scalar multiple of the other.  This scalar will turn out to be the Lagrange multiplier, so we’ll call it \lambda: \nabla f = \lambda \nabla g.  This vector equation expands into one equation for each of x and y.  Adding the constraint itself to our set of equations, we get:

\boxed{\begin{array}{rcl}\frac{\partial f}{\partial x}&=&\lambda \frac{\partial g}{\partial x}\\ \frac{\partial f}{\partial y}&=&\lambda \frac{\partial g}{\partial y} \\ g(x, y) & = & 0 \end{array}}

The method of Lagrange multipliers is to produce these equations and solve them.  However, it is usually described in terms of the Lagrangian function L(x, y, \lambda) = f(x, y) - \lambda g(x, y).  Setting the partial derivatives of the Lagrangian with respect to x, y, and \lambda equal to zero reproduces exactly the boxed equations above. For instance, \partial L / \partial x = \partial f / \partial x - \lambda \partial g / \partial x.  Setting this equal to zero gives you the first equation in the box.  For the last equation, \partial L / \partial \lambda = -g(x, y).  If you prefer, you don’t need to worry about the Lagrangian at all; you just need to find a point (x, y) and a scalar \lambda, such that \nabla f = \lambda \nabla g and the constraint is satisfied.  Such points will be your candidates for the constrained minimum (or maximum).

To summarize, constrained minima and maxima occur at points where the level curve of the constraint function g is tangent to the level curve of the objective function f.  This happens when \nabla f = \lambda \nabla g, giving us the equations in the box above.  The Lagrangian function L unifies those equations together in terms of the partial derivatives of one function.

Multiple Constraints

Now we’ll look at the general case, where we are minimizing a function of n variables, f(x_1, ..., x_n), and we have k constraints: g_i(x_1, ..., x_n) = 0, where i goes from 1 to k.  The Lagrangian L is defined as above, except with a different \lambda for each constraint:

L(x_1, ..., x_n, \lambda_1, ..., \lambda_k) = f(x_1, ..., x_n) - \sum_{j=1}^k \lambda_j g_j(x_1, ..., x_n)

(We’ll use j for an index into the k constraints and i for an index into the n variables x_i.)  The constrained minimum occurs at a point where all n+k of the partial derivatives of L are zero.

We’ll use a relatively trivial example that nevertheless illustrates the method: we are trying to minimize x_1^2 + x_2^2 + x_3^2 subject to two constraints: x_1 = 1 and x_2 = 1.  The Lagrangian is x_1^2 + x_2^2 + x_3^2 - \lambda_1 (x_1 - 1) - \lambda_2 (x_2 - 1), and its partial derivatives are:

\begin{array}{rcl} \partial L / \partial x_1 &=& 2 x_1 - \lambda_1 \\ \partial L / \partial x_2 &=& 2 x_2 - \lambda_2 \\ \partial L / \partial x_3 &=& 2 x_3 \\ \partial L / \partial \lambda_1 &=& -(x_1 - 1) \\ \partial L / \partial \lambda_2 &=& -(x_2 - 1) \end{array}

Setting them all equal to zero yields the solution: (x_1, x_2, x_3) = (1, 1, 0) and \lambda_1 = \lambda_2 = 2.

In the one constraint case, the Lagrange multiplier equations are equivalent to the vector equation \nabla f = \lambda \nabla g.  With multiple constraints, the corresponding vector equation is:

\nabla f = \sum_{j = 1}^k \lambda_j \nabla g_j

This equivalence follows from the fact that component i of the above equation says that \partial f / \partial x_i = \sum_{j=1}^k \lambda_j \partial g_j / \partial x_i, while \partial L / \partial x_i = \partial f / \partial x_i - \sum_{j=1}^k \lambda_j \partial g_j / \partial x_i, so setting the partial \partial L / \partial x_i equal to zero for each i from 1 to n is equivalent to the vector equation above.  Note that setting the partials \partial L / \partial \lambda_j equal to zero simply enforces the constraints g_j(x_1, ..., x_n) = 0.

In the one constraint case, it is useful to find points where \nabla f = \lambda \nabla g because that implies that the level curves of f are tangent to the level curves of g.  In the multiple constraints case, it’s not so simple.  The level surfaces of the individual g_i are not necessarily tangent to the level surface of f at the constrained minimum.  (In the example above, the constraint plane x_1 = 1 intersects the level surface of f, the sphere x_1^2 + x_2^2 + x_3^2 = 2, at the optimal point, (1, 1, 0).)  Instead, it turns out that the intersection of all the level surfaces of the g_i is tangent to the level surface of f.

To understand this better, let’s look directly at the effect of a small perturbation to x on both f and the constraint functions g_j.  Writing x = (x_1, ..., x_n) and letting \Delta x = (\Delta x_1, ..., \Delta x_n), we have:

f(x + \Delta x) - f(x) \approx \sum_{i=1}^n \Delta x_i \frac{\partial f}{\partial x_i}

If there were a change to x, \Delta x, that kept all of the constraint functions g_j the same (to first order) and caused non-zero change to f(x), then x could not possibly be a constrained local minimum, because moving in some direction along the intersection of the level curves of all the g_j would cause f to decrease.  So the condition we want is:

1. For every possible \Delta x such that \nabla g_j(x) \perp \Delta x for every j from 1 to k, \nabla f(x) \perp \Delta x\textrm{.}

For each j such that \nabla g_j(x) \perp \Delta x, \Delta x is in a direction tangent to the level surface of g_j.  If it is true for every j, \Delta x is in a direction tangent to the intersection of the level surfaces of the all the g_j.  Condition 1 says that any such \Delta x is also tangent to the level surface of f.  On the other hand, the condition enforced by Lagrange multipliers is:

2. There exist \lambda_1 through \lambda_k such that \nabla f = \sum_{j = 1}^k \lambda_j \nabla g_j.

Happily, it turns out that conditions 1 and 2 are equivalent to each other.

Theorem: Conditions 1 and 2 are equivalent to each other.

Proof:  (Why do a rigorous proof now when we’ve been waving our hands all day?  Sometimes the most intuitive argument for something is a proof; perhaps this is such a case.)  First we’ll show that 2 implies 1.  Assume that \nabla f = \sum_{j = 1}^k \lambda_j \nabla g_j, and that \nabla g_j(x) \perp \Delta x for every j from 1 to k.  Since \nabla g_j(x) \perp \Delta x, \nabla g_j(x) \cdot \Delta x = 0.  Therefore \nabla f \cdot \Delta x = (\sum_{j = 1}^k \lambda_j \nabla g_j) \cdot \Delta x = \sum_{j = 1}^k \lambda_j \nabla g_j \cdot \Delta x = 0.  This implies that \nabla f \perp \Delta x.

Next we’ll show that 1 implies 2, or equivalently, that not 2 implies not 1.  The negation of 2 says that \nabla f is not in the span of the k vectors \nabla g_j.  Let’s denote by z the projection of \nabla f into the span of the \nabla g_j, and set \Delta x = \nabla f - z.  We will show that \Delta x is a counterexample establishing the negation of condition 1.

We know that \nabla f \neq z, so \Delta x \neq 0, and by the definition of projection, \Delta x \perp g_j for each j.  Additionally, \nabla f \cdot \Delta x = (z + \Delta x) \cdot \Delta x = z \cdot \Delta x + \Delta x \cdot \Delta x,  Since z is in the span of the g_j, all of which are perpendicular to \Delta x, z \perp \Delta x as well.  Adding z \cdot \Delta x = 0 and \Delta x \cdot \Delta x \neq 0, we get (z + \Delta x) \cdot \Delta x \neq 0.  That is, \nabla f \cdot \Delta x \neq 0, so \nabla f is not perpendicular to \Delta x.  Since \Delta x is perpendicular to all of the \nabla g_j, but not to \nabla f, condition 1 is violated.  We assumed the negation of condition 2, so this shows that not 2 implies not 1, or equivalently, that 1 implies 2.  \square

Now we’ve seen that Lagrange multipliers work by finding an x and a set of \lambda_j such that \nabla f = \sum_{j = 1}^k \lambda_j \nabla g_j.  This equation implies that there is no local change to x that, to first order, preserves all the constraints but changes the objective function f.  Therefore the values of x that satisfy the equation are candidates for a constrained local minimum or maximum.  Solving this vector equation is equivalent to setting the partial derivatives of the Lagrangian function to zero.

I hope that sheds some light on how and why Lagrange multipliers work, and makes them seem a bit less like a magic trick.


2 thoughts on “Understanding Lagrange Multipliers

  1. Thanks for taking the effort of writing this. I have always found lagrange multipliers hard to understand because most of the explanations are based on visualization which doesn’t work beyond 2 dimensions.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s