The early history of min-e problem was begun from the problem of L2-optimal approximation for continuous one-variable function. In 1961 Stone [257] considered piecewise-linear curve fitting as a formal optimization problem. The objective was to minimize the squared approximation error subject to a constraint on the number of linear segments. Bellman followed with a solution [27] based on his principle of optimality [26]. Later Gluss [92] expanded upon Bellman’s work. Lawson [151] used dynamic programming (DP) to establish the existence of a balanced error property. Cox discussed a similar approach in his paper [57]. Cantoni [34] determined the optimal polygon of a known nonlinear function by minimizing the weighted integral square errors. The works performed by Bellman, Gluss, Stone, Cantoni and Cox hold only for a 1-D signal whose analytic forms are known. Nevertheless, it inspired other researchers to use the dynamic programming approach for optimal approximation of digital curves [194, 44].
In 1996, Chen et al. [44] studied min-e problem with measures L2 and L¥ for approximation of 2-D planar curves by circular arcs and line segments they proposed algorithm, which was also inspired by DP method of Bellman [27, 92] for 1-D continuous functions.
Dahl and Realfsen [58] presented min-e problem as searching of shortest path in a directed acyclic graph containing at most M arcs. Regrettably, they restricted consideration by the case M>N/2 only. Moreover, the DP search of the approximation error minimum is limited to three neighbouring vertices only. Generally speaking, it makes the algorithm sub-optimal. Anyway, for the case under consideration (large number of segments relatively the number of vertices) the vertex adjustment-based approach can provide results that are quite close to the optimal approximation.
Heckbert and Garland published a survey in 1997 [101], and they wrote about optimal approximation of digital 1-D digital waveform f(x) with minimum error: "the L2-optimal approximation to a function f(x) can be found in O(MN2) time, worst case, using dynamic programming", but they did not provide any details of the algorithm or references.
Haugland et al. [97, 98, 100] represented optimal algorithm for min-e problem for 1-D waveforms (ECG) with error measure L2. In contrast to Perez-Vidal algorithm, the proposed solution was based on algorithm for resource-constrained shortest path in graph, introduced by Saigal [228] and later corrected by Rosseel [226]. Nevertheless, from computational point of view the suggested DP-based algorithm for cardinality constrained shortest path (CCSP) is equivalent to Perez-Vidal algorithm for 1-D signals. The CCSP approximation algorithm was used for lossy compression of ECG signals with a given compression ratio [180, 178, 181]. Furthermore, the CCSP algorithm was extended to the case of planar curves [182, 178]. This DP algorithm is also equivalent to solution originally given by Perez and Vidal [194] for 2-D planar curves.
Tseng et al. [274] presented DP algorithm for optimal approximation with a given error tolerance (min-# problem). Three error measures were used, including L1, L¥ and a length cost function. The essence of the algorithm is the same as that of Perez and Vidal. The only difference is the stop rule: in Perez-Vidal algorithm the DP search in the state space is continued until a given number M of segments is reached, in the algorithm of Tseng et al. the search is performed until the current approximation error is less than a given error tolerance e.
Mori et al. [171] proposed DP algorithm for optimal approximation of curves by linear segments, circular arcs and splines. The approach is the same as that of Perez and Vidal. The DP approach of Perez and Vidal was used for approximation of input curve using circular arcs [193], circular arcs and line segments [111]. In [52] the Perez-Vidal algorithm was extended to the case of polygonal approximation in 3-D space with local integral square error (LISE) measure.
The main contribution to the reduction of the complexity of Perez-Vidal algorithm has been done by Salotti [230, 231]. The main idea behind the algorithm is to stop the search as soon as possible using heuristic functions to estimate the cost function. The algorithm has been implemented in two versions: A*-search [230, 231] and dynamic programming [232]. At first, the rough approximation with Pavlidis algorithm [190] is performed to find approximation error to be used further as upper bound for approximation error g . Then A* or DP search is performed. If the estimated cost function for the current vertex is bigger than the upper bound g, the next vertices located further along the curve P are not examined as possible candidates as approximation points of Q. Salotti offered two methods for estimation of the remaining cost function from the current vertex to the goal vertex. The use of heuristics in the algorithms makes it difficult to estimate the complexity. Experiments provided with test shapes have shown that the complexity of the algorithm is close to O(N2).
The paper of Perez-Vidal [194] is the key publication for min-e problem for digital curves because it was the first publication where the optimal algorithm for the problem has been proposed. Moreover, it was the first paper, which contained deep analysis of all the questions concerning the min-e problem. In more recent publications [97, 98, 100, 182, 177, 274, 171] the algorithm of Perez and Vidal has been rediscovered.
Min- e problem for concave/convex curves
Chan and Chin [37] proposed L¥ -optimal solution for min-e problem for the convex curves of complexity O(N2) with algorithm, based on the search of the shortest path in directed graph. They offered to take advantage of the convexity of the input curve P to construct the graph G(P) on the vertices of the P in O(N2) time, and to find the min-e approximation with an additional O(MN) time.
Aggarwal et al. used matrix search algorithm for cost functions with Monge property to solve optimization tasks: min-e problem for the closed convex/concave curves with maximum area or perimeter cost function [7] and finding a minimum-weight k-link path in graphs [8]. For monotonic cost function with Monge property the optimization problem can be solved by DP algorithm with matrix search in O(MN) time [7, 297, 298]. Thus, in the case of convex/concave curves, the min-e problem with L2 error metrics can be solved with matrix search algorithm in O(MN) time [7]. Regrettably, in most cases the realistic digital curves are not globally concave or convex. Probably, this approach can be applied for locally concave/convex smooth curves. This is subject for the future consideration.
(3.3.1)
Figure 3.3.1: Scheme of computation of the cost function D(n, m) in the state space W by dynamic programming algorithm of Perez and Vidal.
// Initialization
D(1,0)¬0
FOR n = 2 TO N DO
D(n,0)´
END
// Minimum search
FOR m = 1 TO M DO
FOR n = m TO N DO
dmin ¬ ¥
FOR j= m-1 TO n-1 DO
d¬ D(j, m-1) + e2(i,j)
IF(d < dmin)
dmin ¬ d;
jmin ¬ j;
ENDIF
ENDFOR
D(n,m)¬dmin
A(n,m)¬jmin
ENDFOR
ENDFOR
// Backtracking for the solution H(m)
H(M)= N
FOR m = M TO 1 DO
H(m-1) = A(H(m), m))
ENDFOR
E2(P)¬D(N,M)
Figure 3.3.2: General scheme of full search dynamic programming algorithm of Perez and Vidal.
The optimization problem can be solved by the dynamic programming algorithm as proposed by Perez and Vidal [194] with the following recursive expressions (see Fig. 3.3.1):
(3.3.2)
Here A(n, m) is the parent state that provides the minimum value for the cost function D(n, m) at the state (n, m) of the state space W(see Fig. 3.3.1).
Approximation vertices qm and qm+1 of Q are vertices pi and pj of the correspondent line segment. The general scheme of the full search Perez and Vidal’s algorithm is presented on Fig. 3.3.2.
a)
b)In fact, to obtain approximation with N-2 line segments we have to eliminate only one vertex from the input curve P (for example, with one step of Merge algorithm of Pikaz and Dinstein [198]). So, the best solution can be found in linear time by checking approximation error for every vertex as the eliminated one. On the other hand, complexity of the optimal algorithm for M = N-2 is given as O((N-2)N2))= O(N3) time. The reason for the high complexity of the algorithm is the redundancy of the search: in the case under consideration we are constructing all solutions for m=1, 2, .., M-2, although in fact we need to know solution for single goal state W(N, M-2).
To solve the paradox, we proposed in [P4] a modification of the state space. The state space has to be bounded to eliminate states that are not necessary for the construction of the goal state (see Fig. 3.3.3, right). The complexity of the dynamic programming algorithm with the modified state space is O(M(N-M)2). As we can see, the time complexity of the modified DP algorithm for the trivial case in question is O(N), as it should be (see Fig. 3.3.4) .
In same cases, the input curve can be approximated with zero error by less number linear segments as a given M. In algorithm of Perez and Vidal such situation can be detected and the computation can be stopped even if the current number of segments m is less than M. In full search algorithm in modified state space algorithm, to be sure that a given number corresponds non-zero approximation error, we can check it with a simple procedure in O(N) time by elimination of those points, whose absence does not affect on the total approximation error. If the found number Mmin is bigger than a given number of linear segments M we can find approximation solution for this M, otherwise we can approximate the input curve by smaller number of segments Mmin with zero approximation error.
a)
b)
Figure 3.3.42: State space W for the case
N = M- 2:
in Perez-Vidal algorithm (a), and in the proposed algorithm
[P4]
(b).
Haugland et al. [97, 98, 100] and Horng and Li in [111] offered to calculate approximation errors for all pairs of vertices in advance and store it in a 2-D array of N´ N size. The reasons of this technique were different: Haugland et al. treat the min-e problem as the search of the cardinality constrained shortest path in graph and all the weights in graph should be known prior the computations. The purpose of Horng and Li was to reduce processing time by performing all computations only one time and to use the stored values later.
The last idea seems to be reasonable, because it really permits to avoid recalculation of the values and reduce processing time. In practice, however, this method in the form suggested by Horng and Li works for relatively small N only because of high space-complexity of the method. Let us consider the following example of DP approximation: the 5004-vertex test shape #3 [231] is to be approximated by 50 linear segments. With algorithm of Perez and Vidal the approximation can be computed in 710 s [231]. According to the mentioned above approaches [97, 98, 111] the error values have to be stored in 2-D array of N´ N size; it gives 200 Mb for the N=5000. Moreover, for approximation of closed contours Horng and Lee proposed to use 2-D array of 2N´ 2N size that means allocation of 800 Mb in the case under question. When RAM size is limiting resource, demands for allocation of 200 Mb (or even 800 Mb) can be a problematic way to reduce processing time.
Thus, for small N the precalculation method [97, 98, 111] does work, but only in the case when the processing time is small even without using the technique. For large N, however, the precalculation method cannot work efficiently because of high space complexity O(N2).
In [P4] , we have proposed modification of DP algorithm of Perez and Vidal with precalculation by total cost of O(MN) space instead of O(N2) as in [97, 98, 111]. Actually this space complexity is the same as that of the original algorithm, namely O(MN).
Let us consider memory demands of the original Perez-Vidal algorithm in details. According to the scheme represented in the algorithm, the DP calculations are performed sequentially for all m starting from 1 to M. To support the calculation we need 2-D array of M´ N size to store parent states A(m, n) for the optimal sub-paths. For storing the cost function D(m, n) it is enough to store 2-D table of 2´ N size. This is because we need to know only the previous row to calculate the current one. We also need 2-D table of 5´N size for five arrays of cumulatives of coordinates to calculate approximation error on-fly. The total space complexity of Perez-Vidal algorithm is O(MN).
According to the approach of Horng and Li [111] with storing of precalculated errors we need an additional 2D array of N ´ N size for the errors, which increases the total space complexity to O(N2). In addition, in [97, 98, 111] for the cost function D(m, n) instead of 2´N array was used 2-D array of M´N size.
Table 3.3.1. Space demands for DP algorithms for min-e problem with precalculation.|
|
Full search 1) |
Full search [P4] |
Reduced search [P4] |
|
Cost function D |
O(MN) |
O(MN) |
O(WN) |
|
Parent states A |
O(MN) |
O(MN) |
O(WN) |
|
Cumulatives |
O(N) |
O(N) |
O(N) |
|
Errors e2(pi,pj) |
O(N2) |
O(N) |
O(N) |
|
Total space |
O(N2) |
O(MN) |
O(WN) |
Now let us change the order of processing from row-by-row to column-by-column. We will fill the state space with solutions of sub-problem column-by-column, at first for n=2, then for n=3, until we reach the last vertex n=N. For every current vertex n, we have to find solutions of the sub-problems for all allowable values of m in the state space. For this computational scheme we need 2-D array for A(m, n) of size M´ N and five arrays of cumulatives of coordinates as in the original algorithm. Now we need 2-D array of size M´ N for the cost function D(m, n), not 2´N as in the previous case. For the processing of the current vertex, we need approximation errors for linear segments from the current vertex to the vertices already passed. We do not need anymore the approximation errors from all vertices to all ones for processing the current state. For this purpose it is enough 1-D array of size 1´N. To support the storing pre-calculated cost function values, we have to use O(MN) memory for the cost function D(m, n).
Comparing space demands for two DP algorithms with full search (see Table 3.3.1), we can use precalculation method at the cost of O(MN) additional memory instead of O(N2) as suggested in [97, 98, 111]. For the considered above example of approximation of 5000-vertex curve by M=50 segments with the proposed scheme in [P4] we need additionally about 1 Mb, that is only 1% of amount we must allocate with the approach of Horng and Li.
Perez and Vidal have proposed optimal DP-based algorithm which can solve the min-eproblem with L2 error metrics in O(MN2) time. We introduced two improvements to the core algorithm, including modification of the state space to reduce complexity of the algorithm to O(M(N-M)2), and a scheme for the reducing processing time by storing of pre-calculated approximation errors. In the case of min-e approximation of locally concave/convex smooth curves, it is worth further studies, if we can apply the matrix search algorithm to reduce processing time.