Subquadratic Algorithms for Longest Common Subsequence

Hunt and Szymanski [16] introduced a fast algorithm for computing longest common subsequences. They provided a running time of 0(r+n)lg(n), where r is the total number of ordered pairs of positions at which the two sequences match. In the worst case, the algorithm has an 0( lg(n)) running time. However, for those applications where most positions of one sequence have few matches in the other sequence, a running time of 0( lg(n)) can be expected. [Pg.266]

Let S be a finite sequence of elements chosen from some alphabet I. The length of the sequence S is s. S[i] is the ith element of S and S[t j] is the sequence S[t], S[/+l], S[/+2]...S[y]. If U and V are finite sequences, then U is said to be a subsequence of V if there exists a monotonically increasing sequence of integers r, r2.r u such that U[i] = V[rJ for 1 / u. U is a common subsequence of S and T if U is a subsequence of both S and T. A longest common subsequence is a common subsequence of greatest possible length. Both sequences are assumed to have the same length n. The munber of elements in set (ij) such that S[t] = T[/] is denoted by r. [Pg.266]

The data structure used in the algorithm of Hunt and Szymanski is G . is an array of threshold grades defined by the smallest j such that S[l t] and T[l 7] contain a common subsequence of length k. Each G,may be considered as a pointer that [Pg.266]

Lemmas 1, 2, and 3 are stated and the proofs are available in [17], This algorithm can be completed with an 0( lg(n)) time efficiency to determine the length of the common subsequence. This can be refined to improve the running time to 0(r+n) lg(n), and the longest common subsequence can be recovered. [Pg.267]

A small amount of preprocessing will improve the performance of Algorithm 2 in great measure. The main source of inefficiency in Algorithm 2 is the inner loop j in which the elements that match S[i] are searched for repeatedly in the T sequence. A linked list can be used to eliminate this search step. For each i a list of corresponding j positions is needed such that S[i] = T[j]. These lists must be retained in decreasing order of j. All positions of the S sequence that contain the same element may be set up to use the same physical list of matching /s. [Pg.267]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...