Lagrange multipliers are a method for locally minimizing or maximizing a function, subject to one or more constraints. For example, find the values of and that make as small as possible, while satisfying the constraint . What follows is an explanation of how to use Lagrange multipliers and why they work. The explanation (especially of why they work) is much simpler with only one constraint, so we’ll start with an intuitive argument in that case and then move on to a slightly more rigorous argument in the case with multiple constraints. Both cases require some familiarity with partial derivatives and vectors.

### One Constraint

Say we want to minimize a function , subject to the constraint . (For now we’ll minimize a function of two variables, but the method extends naturally to functions of more variables, and the procedure for maximization is exactly the same.) For the example above, and (becasue is equivalent to ). To solve the problem, we create a new function, , called the Lagrangian. It is defined as , where is a new variable called the Lagrange multiplier. (Some people use a instead of a , but it doesn’t change anything except for the sign of .) Then, we take the partial derivatives of with respect to the original variables, and , and also . We set all those partials equal to zero, and solve for and . For the example given above, we have:

Setting all the partials equal to zero and solving, we get that and is either or . Indeed, these points are local minima of , subject to the constraint that . (In this case they are also the global minima, although in general that is not guaranteed.)

Why do Lagrange multipliers work? It is tempting to think that we are finding a constrained minimum of by finding an unconstrained minimum of ; after all, we are setting all the partials of equal to zero. But this is wrong. The solution turns out to actually never be a local minimum or maximum for ; it’s always a saddle point for . So why do Lagrange multipliers work? The answer has to do with level curves and gradients.

A level curve of a function is the set of points for which the function equals a constant value. (In three dimensions they’re called level surfaces, and more generally, level sets.) In the picture above, the red circles are some of the level curves of (larger circles correspond to higher values of ). The blue curve is the set of points that satisfy the constraint , making it a level curve of .

Constrained minimization seeks the point on the blue curve that is the lowest in the red valley: the valley where each red curve represents a different height. It turns out that this point will always be one where the red and blue level curves are tangent to each other, such as the black point in the picture above. To see why, it helps to look at a point where the curves are not tangent, such as the orange point. Since the red and blue level curves aren’t tangent to each other there, they cross each other. Since they cross, moving one direction along the blue curve moves up in the red valley, and moving the other direction moves down. In other words, there is a local change to that preserves the constraint and causes the objective function to decrease, and the opposite change causes it to increase. There’s no way it can be a constrained minimum or maximum. Since the minimum can’t be at a point where the level curves cross, it has to be at a point where they are tangent. There, as you move along the curve of , holds constant to a first order approximation, so it could be a minimum or a maximum. (It still might not be either, but at least it could be.)

How do we express mathematically the statement that the level curves of and are tangent? This is where the gradients come in. The gradient of a function is written as and defined as a vector where each component is the partial derivative of with respect to the corresponding variable: . The gradient has a property that will be very useful to us: the gradient of is always perpendicular to the level curve of at a given . To see why this is true, say that is a small amount by which we wish to change . How will it change ? To first order, the answer is that:

If is in the direction of the level curve, we will have , which means that . In other words, the dot product of and is zero. Since is in the direction of the level curve, this means that the level curve and the gradient are perpendicular to each other.

Since we want the level curves of and to be tangent to each other, and each function’s gradient is perpendicular to its level curve, we want the gradients to point in the same direction (or in opposite directions). Two vectors point in the same direction if one is a scalar multiple of the other. This scalar will turn out to be the Lagrange multiplier, so we’ll call it : . This vector equation expands into one equation for each of and . Adding the constraint itself to our set of equations, we get:

The method of Lagrange multipliers is to produce these equations and solve them. However, it is usually described in terms of the Lagrangian function . Setting the partial derivatives of the Lagrangian with respect to , , and equal to zero reproduces exactly the boxed equations above. For instance, . Setting this equal to zero gives you the first equation in the box. For the last equation, . If you prefer, you don’t need to worry about the Lagrangian at all; you just need to find a point and a scalar , such that and the constraint is satisfied. Such points will be your candidates for the constrained minimum (or maximum).

To summarize, constrained minima and maxima occur at points where the level curve of the constraint function is tangent to the level curve of the objective function . This happens when , giving us the equations in the box above. The Lagrangian function unifies those equations together in terms of the partial derivatives of one function.

### Multiple Constraints

Now we’ll look at the general case, where we are minimizing a function of variables, , and we have constraints: , where goes from to . The Lagrangian is defined as above, except with a different for each constraint:

(We’ll use for an index into the constraints and for an index into the variables .) The constrained minimum occurs at a point where all of the partial derivatives of are zero.

We’ll use a relatively trivial example that nevertheless illustrates the method: we are trying to minimize subject to two constraints: and . The Lagrangian is , and its partial derivatives are:

Setting them all equal to zero yields the solution: and .

In the one constraint case, the Lagrange multiplier equations are equivalent to the vector equation . With multiple constraints, the corresponding vector equation is:

This equivalence follows from the fact that component of the above equation says that , while , so setting the partial equal to zero for each from to is equivalent to the vector equation above. Note that setting the partials equal to zero simply enforces the constraints .

In the one constraint case, it is useful to find points where because that implies that the level curves of are tangent to the level curves of . In the multiple constraints case, it’s not so simple. The level surfaces of the individual are not necessarily tangent to the level surface of at the constrained minimum. (In the example above, the constraint plane intersects the level surface of , the sphere , at the optimal point, .) Instead, it turns out that the *intersection* of all the level surfaces of the is tangent to the level surface of .

To understand this better, let’s look directly at the effect of a small perturbation to on both and the constraint functions . Writing and letting , we have:

If there were a change to , , that kept all of the constraint functions the same (to first order) and caused non-zero change to , then could not possibly be a constrained local minimum, because moving in some direction along the intersection of the level curves of all the would cause to decrease. So the condition we want is:

1. For every possible such that for every from to ,

For each such that , is in a direction tangent to the level surface of . If it is true for every , is in a direction tangent to the intersection of the level surfaces of the all the . Condition 1 says that any such is also tangent to the level surface of . On the other hand, the condition enforced by Lagrange multipliers is:

2. There exist through such that .

Happily, it turns out that conditions 1 and 2 are equivalent to each other.

**Theorem:** Conditions 1 and 2 are equivalent to each other.

Proof: (Why do a rigorous proof now when we’ve been waving our hands all day? Sometimes the most intuitive argument for something is a proof; perhaps this is such a case.) First we’ll show that 2 implies 1. Assume that , and that for every from to . Since , . Therefore . This implies that .

Next we’ll show that 1 implies 2, or equivalently, that not 2 implies not 1. The negation of 2 says that is not in the span of the vectors . Let’s denote by the projection of into the span of the , and set . We will show that is a counterexample establishing the negation of condition 1.

We know that , so , and by the definition of projection, for each . Additionally, , Since is in the span of the , all of which are perpendicular to , as well. Adding and , we get . That is, , so is not perpendicular to . Since is perpendicular to all of the , but not to , condition 1 is violated. We assumed the negation of condition 2, so this shows that not 2 implies not 1, or equivalently, that 1 implies 2.

Now we’ve seen that Lagrange multipliers work by finding an and a set of such that . This equation implies that there is no local change to that, to first order, preserves all the constraints but changes the objective function . Therefore the values of that satisfy the equation are candidates for a constrained local minimum or maximum. Solving this vector equation is equivalent to setting the partial derivatives of the Lagrangian function to zero.

I hope that sheds some light on how and why Lagrange multipliers work, and makes them seem a bit less like a magic trick.