Longest Common Subsequence
...with an unconventional approach
The longest common subsequence (LCS) problem is typically solved using the classic dynamic programming approach: bottom-up with memo. Here, we attempt an atypical implementation which solves the LCS problem across three sequences using a sparse memo.
Problems amenable to dynamic programming have to meet two important criteria:
-
Optimal substructure: Solving a subproblem optimally, contributes to the optimal solution of the larger problem. The longest subsequence terminating at a particular point in the string, is made up of ‘smaller’ longest subsequences terminating at successively preceding points.
-
Overlapping subproblems: The longest subsequence is made up of constituent subsequences which are the longest at their respective points of termination. As the subsequence grows, we recursively or iteratively search for the longest preceding subsequence and end up solving the same subproblems over and over.
The Implementation
We share the implementation in four parts. The first part below is the function definition of LCS and reading in the arguments.
|
|
Populating the solution space
The second part below implements the sparse data structure (memo
), the solution
search (max3d
) and the nested loop that implements the recurrence relation for LCS.
Our memo
is sparse in the sense we keep track of only the intersections and the
length of the longest subsequence terminating at that point as a 4-tuple. Each
4-tuple: (l,i,j,k)
represents the longest subsequence terminating at the
intersection (i,j,k)
with its length l
. These are the solutions to our subproblems.
A subsequence has to necessarily end at the matching intersection of strings
(3 in our case). Thus, finding the longest subsequence at an intersection (i,j,k)
involves finding the longest subsequence among all preceding/prior intersections
and adding 1 to it.
Intuition: Finding this longest preceding subsequence involves locating the 4-tuple in the cuboid space (because we have 3 strings) enclosed by the planes
x=0,y=0,z=0
andx=i,y=j,z=k
, that corresponds to the longest preceding subsequence. This is an implementation of our recurrence relation.
Note that x,y,z
axes represent the 1-based indices of the three respective strings
layed out along these axes. It is also important to have a base-case tuple (0,0,0,0)
to initialize our sparse memo.
|
|
Identifying the solution
We have the memo filled up now, with tuples that have all longest subsequence
terminations at each intersection (i,j,k)
. All we have to do is find the
tuple with the maximum LCS length. However, we do more: we identify the indices
of the LCS in each of the three strings and thus the literal LCS as well.
The while
loop starts with the maximum value of the LCS and works its way back
to the previous termination iteratively until it reaches the first element in the
sequence using the same solution search approach (max3d
).
|
|
To exercise our code, we create three random substrings and get the LCS common to them.
|
|
The output of the program is:
The three strings:
aYM7zR76GWMyWmdwpb98syIZ3UgDP9OUKdgQuqM4JqGNpAdnpe9qtvZX3gidinQqxqa7vC
cVTJwuwMXl4sO1AopWUdjlyhrsdztMRhsSnjmEe5bFce9VjxIqLg2zagKNcnkcZHlqdsFPMM8aT
r4WTlMZnA2MRIydpRgfh5oxnBWrHxhsi20IPhAzXDjBN9k41rPNGXHum2AvRvUtZgVOT8LjUYZLmVAci
Length of longest subsequence, its indices in s1,s2,s3
(7, [10, 11, 14, 20, 29, 43, 54], [7, 22, 26, 32, 44, 57, 62], [10, 13, 14, 30, 44, 50, 73])
Longest subsequence in s1,s2,s3: ['M', 'y', 'd', 's', '9', 'N', 'Z']
Takeaways
We have here, an implementation of LCS that is frugal on memory and extensible across two or more strings. The bottleneck is the nested loop that checks for intersection across the lists. We have also illustrated an intuitive conceptualization of the LCS problem and the exploration of its solution space.