Matrix chain multiplication is an optimization problem that can be solved using dynamic programming. Given a sequence of matrices, the goal is to find the most efficient way to multiply these matrices. The problem is not actually to perform the multiplications, but merely to decide the sequence of the matrix multiplications involved.
Let's say we have two matrices A1 and A2 of dimension m * n and p * q. From the rules of matrix multiplication, we know that,
If we have three matrices A1, A2 and A3 having dimension m * n, n * p and p * q respectively, then A4 = A1 * A2 * A3 will have dimension of m * q. Now we can perform this matrix multiplication in two ways:
You can notice the order of the multiplication remains the same, i.e. we don't multiply (A1 * A3) * A2 because it might not be valid. We only change the parenthesis to multiply a set before multiplying it with the remaining. How we place these parenthesis are important. Why? Let's say, the dimension of 3 matrices A1, A2 and A3 are 10 * 100, 100 * 5, 5 * 50. Then,
For the 2nd type the number of scaler multiplication is 10 times the number of 1st type! So if you can devise a way to find out the correct orientation of parenthesis needed to minimize the total scaler multiplication, it would reduce both time and memory needed for matrix multiplication. This is where matrix chain multiplication comes in handy. Here, we'll not be concerned with the actual multiplication of matrices, we'll only find out the correct parenthesis order so that the number of scaler multiplication is minimized. We'll have matrices A1, A2, A3 ........ An and we'll find out the the minimum number of scaler multiplications needed to multiply these. We'll assume that the given dimensions are valid, i.e. it satisfies our first requirement for matrix multiplication.
We'll use divide-and-conquer method to solve this problem. Dynamic programming is needed because of common subproblems. For example: for n = 5, we have 5 matrices A1, A2, A3, A4 and A5. We want to find out the minimum number of multiplication needed to perform this matrix multiplication A1 * A2 * A3 * A4 * A5. Of the many ways, let's concentrate on one: (A1 * A2) * (A3 * A4 * A5).
For this one, we'll find out Aleft = A1 * A2. Aright = A3 * A4 * A5. Then we'll find out Aanswer = Aleft * Aright. The total number of scaler multiplications needed to find out Aanswer: = The total number of scaler multiplications needed to determine Aleft + The total number of scaler multiplications needed to determine Aright + The total number of scaler multiplications needed to determine Aleft * Aright.
The last term, The total number of scaler multiplications needed to determine Aleft * Aright can be written as: The number of rows in Aleft * the number of columns in Aleft * the number of columns in Aright. (According to the 2nd rule of matrix multiplication)
But we could set the parenthesis in other ways too. For example:
For each and every possible cases, we'll determine the number of scaler multiplication needed to find out Aleft and Aright, then for Aleft * Aright. If you have a general idea about recursion, you've already understood how to perform this task. Our algorithm will be:
- Set parenthesis in all possible ways.
- Recursively solve for the smaller parts.
- Find out the total number of scaler multiplication by merging left and right.
- Of all possible ways, choose the best one.
Why this is dynamic programming problem? To determine (A1 * A2 * A3), if you've already calculated (A1 * A2), it'll be helpful in this case.
To determine the state of this recursion, we can see that to solve for each case, we'll need to know the range of matrices we're working with. So we'll need a begin and end. As we're using divide and conquer, our base case will be having less than 2 matrices (begin >= end), where we don't need to multiply at all. We'll have 2 arrays: row and column. row[i] and column[i] will store the number of rows and columns for matrix Ai. We'll have a dp[n][n] array to store the already calculated values and initialize it with -1, where -1 represents the value has not been calculated yet. dp[i][j] represents the number of scaler multiplications needed to multiply Ai, Ai+1, .....,Aj inclusive. The pseudo-code will look like:
Procedure matrixChain(begin, end):
if begin >= end
Return 0
else if dp[begin][end] is not equal to -1
Return dp[begin][end]
end if
answer := infinity
for mid from begin to end
operation_for_left := matrixChain(begin, mid)
operation_for_right := matrixChain(mid+1, right)
operation_for_left_and_right := row[begin] * column[mid] * column[end]
total := operation_for_left + operation_for_right + operation_for_left_and_right
answer := min(total, answer)
end for
dp[begin][end] := answer
Return dp[begin][end]
Complexity:
The value of begin and end can range from 1 to n. There are n2 different states. For each states, the loop inside will run n times. Total time complexity: O(n³)
and memory complexity: O(n²)
.