Thư viện tri thức trực tuyến
Kho tài liệu với 50,000+ tài liệu học thuật
© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

David G. Luenberger, Yinyu Ye - Linear and Nonlinear Programming International Series Episode 2 Part
Nội dung xem thử
Mô tả chi tiết
8.7 Applications of the Theory 243
Application 2 (Penalty methods). Let us briefly consider a problem with a single
constraint:
minimize fx (41)
subject to hx = 0
One method for approaching this problem is to convert it (at least approximately)
to the unconstrained problem
minimize fx+ 1
2hx
2
(42)
where is a (large) penalty coefficient. Because of the penalty, the solution to (42)
will tend to have a small hx. Problem (42) can be solved as an unconstrained
problem by the method of steepest descent. How will this behave?
For simplicity let us consider the case where f is quadratic and h is linear.
Specifically, we consider the problem
minimize
1
2
xTQx−bT x (43)
subject to cT x = 0
The objective of the associated penalty problem is 1/2 xTQx+xT ccT x
−bT x.
The quadratic form associated with this objective is defined by the matrix Q+ccT
and, accordingly, the convergence rate of steepest descent will be governed by
the condition number of this matrix. This matrix is the original matrix Q with a
large rank-one matrix added. It should be fairly clear† that this addition will cause
one eigenvalue of the matrix to be large (on the order of ). Thus the condition
number is roughly proportional to . Therefore, as one increases in order to get
an accurate solution to the original constrained problem, the rate of convergence
becomes extremely poor. We conclude that the penalty function method used in this
simplistic way with steepest descent will not be very effective. (Penalty functions,
and how to minimize them more rapidly, are considered in detail in Chapter 11.)
Scaling
The performance of the method of steepest descent is dependent on the particular
choice of variables x used to define the problem. A new choice may substantially
alter the convergence characteristics.
Suppose that T is an invertible n×n matrix. We can then represent points in
En either by the standard vector x or by y where Ty = x. The problem of finding
†See the Interlocking Eigenvalues Lemma in Section 10.6 for a proof that only one eigenvalue
becomes large.
244 Chapter 8 Basic Descent Methods
x to minimize fx is equivalent to that of finding y to minimize hy = fTy.
Using y as the underlying set of variables, we then have
h = fT (44)
where f is the gradient of f with respect to x. Thus, using steepest descent, the
direction of search will be
y = −TTf T (45)
which in the original variables is
x = −TTTf T (46)
Thus we see that the change of variables changes the direction of search.
The rate of convergence of steepest descent with respect to y will be determined
by the eigenvalues of the Hessian of the objective, taken with respect to y. That
Hessian is
2
hy ≡ Hy = TTFTyT
Thus, if x∗ = Ty∗ is the solution point, the rate of convergence is governed by the
matrix
Hy∗
= TTFx∗
T (47)
Very little can be said in comparison of the convergence ratio associated with
H and that of F. If T is an orthonormal matrix, corresponding to y being defined
from x by a simple rotation of coordinates, then TTT = I, and we see from (41) that
the directions remain unchanged and the eigenvalues of H are the same as those
of F.
In general, before attacking a problem with steepest descent, it is desirable,
if it is feasible, to introduce a change of variables that leads to a more favorable
eigenvalue structure. Usually the only kind of transformation that is at all practical
is one having T equal to a diagonal matrix, corresponding to the introduction
of scale factors on each of the variables. One should strive, in doing this, to
make the second derivatives with respect to each variable roughly the same.
Although appropriate scaling can potentially lead to substantial payoff in terms of
enhanced convergence rate, we largely ignore this possibility in our discussions of
steepest descent. However, see the next application for a situation that frequently
occurs.
Application 3 (Program design). In applied work it is extremely rare that one
solves just a single optimization problem of a given type. It is far more usual that
once a problem is coded for computer solution, it will be solved repeatedly for
various parameter values. Thus, for example, if one is seeking to find the optimal