# A Simple SQL Solution for the Knapsack Problem (SKP-1)

A poster on OTN (Combination using pl/sql) recently asked for an SQL solution to a problem that turned out to be an example of the well known Knapsack Problem, for the case of a single knapsack. I posted an SQL query as a solution, and also a solution in PL/SQL because the SQL solution uses a feature only available in Oracle v11.2. In this article I explain how the solutions work and provide the results of a performance analysis that involved randomised test problems of varying computational difficulty. I have taken a more general form of problem than the original poster described, and the solutions here have been improved.

Update, 14 July 2013: I used the technique in response to another OTN post here, SQL for the Fantasy Football Knapsack Problem. I have extended the idea there to allow for fast approximate solutions making it viable for larger problems, and have also used a similar idea here, SQL for the Travelling Salesman Problem.

Knapsack Problem (1-Knapsack)
The various forms of knapsack problem have been studied extensively. The problems are known to be computationally difficult and many algorithms have been proposed for both exact and approximate solutions (see reference above). The SQL solution in this article is quite simple and will not be competitive in performance for larger problems in the form described here, but may be interesting for being implemented in pure SQL (and without using Oracle's Model clause, or a purely brute force approach). However, I have later extended the approach to allow for search limiting and have shown this to be viable for larger problems (see links at the top).

The problem can be stated informally, as follows: Given a set of items, each having positive weight and profit attributes, and a weight limit, find the combinations of items that maximise profit within the weight limit. Variant versions include the addition of multiple constraints (easy to handle), and inclusion of multiple knapsacks (more difficult). I also have a solution for the multiple knapsacks version described here (An SQL Solution for the Multiple Knapsack Problem (SKP-m)).

The difficulty of the problem arises from the number of possible combinations increasing exponentially with problem size. The number of these (not necessarily feasible) combinations, N(n,1), can be expressed in terms of the number of items, n, in two ways. First, we can use the well known binomial expression for the number of combinations of r items, summed from  to :

$latex N(n,1) = \sum_{r=0}^n \binom{n}{r}$

where 

Second, and more simply, we can observe that including an item in the combination, or not, is a binary choice, leading to:

\$latex N(n,1) = 

This generalises easily to the expression for the multiple knapsack problem, with m knapsacks:

N(n,m) = 

This can also be expressed using a binomial series as

N(n,m) = 

Here,  represents the number of combinations of r items from n, with  being the number of assignments of the r items to m containers.

Let's look at a simple example problem having four items, with a weight limit of 9, as shown below:

There are 24 possible combinations of these items, having from 0 to 4 items. These are depicted below:

We can see that there are two optimal solutions in this case. How to find them using SQL?

SQL Solution

Oracle's v11.2 implementation of the Ansii standard Recursive Subquery Factoring can be used as the basis for an SQL solution. This would works as follows: Starting from each item in turn, add items recursively while remaining within the weight limit, and considering only items of id greater than the current id. The SQL looks like this, where a marker is added for leaf nodes, following an approach from the Amis technology blog:

WITH rsf (nxt_id, lev, tot_weight, tot_profit, path) AS (
SELECT id nxt_id, 0 lev, item_weight tot_weight, item_profit tot_profit, To_Char (id) path
FROM items
UNION ALL
SELECT n.id,
r.lev + 1,
r.tot_weight + n.item_weight,
r.tot_profit + n.item_profit,
r.path || ',' || To_Char (n.id)
FROM rsf r
JOIN items n
ON n.id > r.nxt_id
AND r.tot_weight + n.item_weight <= 9
) SEARCH DEPTH FIRST BY nxt_id SET line_no
SELECT LPad (To_Char(nxt_id), lev + 1, '*') node,tot_weight, tot_profit,
CASE WHEN lev >= Lead (lev, 1, lev) OVER (ORDER BY line_no) THEN 'Y' END is_leaf,
path
FROM rsf
ORDER BY line_no


and the solution like this:

NODE       TOT_WEIGHT TOT_PROFIT I PATH
---------- ---------- ---------- - ------------------------------
1                   3         10   1
*2                  7         30 Y 1,2
*3                  8         40 Y 1,3
*4                  9         50 Y 1,4
2                   4         20   2
*3                  9         50 Y 2,3
3                   5         30 Y 3
4                   6         40 Y 4

8 rows selected.

The output contains 8 records, as opposed to the total of 23 non-null combinations, because only feasible items are joined, and permutations are avoided by the constraint that item ids increase along the path. Given positivity of weight and profit, we know that all solutions must be leaves, and we can represent the tree structure above in the following diagram:

We can now use the recursive subquery factor as an input to a main query that selects one of the most profitable solutions, or alternatively to a further subquery factor that ranks the solutions in order of descending profit. In the latter case, the main query can select all the most profitable solutions.

In the solution I posted on the OTN thread, I included a subquery factor to restrict the final query section to leaf nodes only. This was because we know that the solutions must be leaf nodes, and usually it is more efficient to filter out non-solution records as early as possible. However, I later realised that the work involved in the filtering might outweigh the saving for the final section, and this turned out to be the case here, as shown in the performance analysis section below. Here are the two queries, without the leaf node filtering:

Query - KEEP

WITH rsf (id, lev, tot_weight, tot_profit, path) AS (
SELECT id, 0, item_weight, item_profit, To_Char (id)
FROM items
UNION ALL
SELECT n.id,
r.lev + 1,
r.tot_weight + n.item_weight,
r.tot_profit + n.item_profit,
r.path || ',' || To_Char (n.id)
FROM rsf r
JOIN items n
ON n.id > r.id
AND r.tot_weight + n.item_weight <= 100
)
SELECT Max (tot_weight) KEEP (DENSE_RANK LAST ORDER BY tot_profit) tot_weight,
Max (tot_profit) KEEP (DENSE_RANK LAST ORDER BY tot_profit) tot_profit,
Max (path) KEEP (DENSE_RANK LAST ORDER BY tot_profit) path,
(Max (lev) KEEP (DENSE_RANK LAST ORDER BY tot_profit) + 1) n_items
FROM rsf

Query - RANK

WITH rsf (id, lev, tot_weight, tot_profit, path) AS (
SELECT id, 0, item_weight, item_profit, To_Char (id)
FROM items
UNION ALL
SELECT n.id,
r.lev + 1,
r.tot_weight + n.item_weight,
r.tot_profit + n.item_profit,
r.path || ',' || To_Char (n.id)
FROM rsf r
JOIN items n
ON n.id > r.id
AND r.tot_weight + n.item_weight <= 100
)
, paths_ranked AS (
SELECT tot_weight, tot_profit, path,
Dense_Rank () OVER (ORDER BY tot_profit DESC) rnk_profit,
lev
FROM rsf
)
SELECT tot_weight tot_weight,
tot_profit tot_profit,
path path,
(lev + 1) n_items
FROM paths_ranked
WHERE rnk_profit = 1
ORDER BY tot_weight DESC

Query Structure Diagram

It's worth noting that Oracle's proprietary recursive syntax, Connect By, cannot be used in this way because of the need to accumulate weights forward through the recursion. The new Ansii syntax is only available from v11.2 though, and I thought it might be interesting to implement a solution in PL/SQL that would work in earlier versions, following a similar algorithm, again with recursion.

PL/SQL Recursive Solution

This is a version in the form of a pipelined function, as I wanted to compare it with the SQL solutions, and be callable from SQL.
SQL

SELECT COLUMN_VALUE sol
FROM TABLE (Packing_PLF.Best_Fits (100))
ORDER BY COLUMN_VALUE

Package

CREATE OR REPLACE PACKAGE BODY Packing_PLF IS

PROCEDURE Write_Log (p_line VARCHAR2) IS
BEGIN
NULL;
END Write_Log;

FUNCTION Best_Fits (p_weight_limit NUMBER) RETURN SYS.ODCIVarchar2List PIPELINED IS

TYPE item_type IS RECORD (
item_id                 PLS_INTEGER,
item_index_parent       PLS_INTEGER,
weight_to_node          NUMBER);
TYPE item_tree_type IS        TABLE OF item_type;
g_solution_list               SYS.ODCINumberList;

g_timer                       PLS_INTEGER := Timer_Set.Construct ('Pipelined Recursion');

i                             PLS_INTEGER := 0;
j                             PLS_INTEGER := 0;
g_item_tree                   item_tree_type;
g_item                        item_type;
l_weight                      PLS_INTEGER;
l_weight_new                  PLS_INTEGER;
l_best_profit                 PLS_INTEGER := -1;
l_sol                         VARCHAR2(4000);
l_sol_cnt                     PLS_INTEGER := 0;

p_item_index_parent     PLS_INTEGER,
p_weight_to_node        NUMBER) RETURN PLS_INTEGER IS
BEGIN

g_item.item_id := p_item_id;
g_item.item_index_parent := p_item_index_parent;
g_item.weight_to_node := p_weight_to_node;
IF g_item_tree IS NULL THEN

g_item_tree := item_tree_type (g_item);

ELSE

g_item_tree.Extend;
g_item_tree (g_item_tree.COUNT) := g_item;

END IF;
RETURN g_item_tree.COUNT;

PROCEDURE Do_One_Level (p_tree_index PLS_INTEGER, p_item_id PLS_INTEGER, p_tot_weight PLS_INTEGER, p_tot_profit PLS_INTEGER) IS

CURSOR c_nxt IS
SELECT id, item_weight, item_profit
FROM items
WHERE id > p_item_id
AND item_weight + p_tot_weight <= p_weight_limit;
l_is_leaf           BOOLEAN := TRUE;
l_index_list        SYS.ODCINumberList;

BEGIN

FOR r_nxt IN c_nxt LOOP
Timer_Set.Increment_Time (g_timer,  'Do_One_Level/r_nxt');

l_is_leaf := FALSE;
Do_One_Level (Add_Node (r_nxt.id, p_tree_index, r_nxt.item_weight + p_tot_weight), r_nxt.id, p_tot_weight + r_nxt.item_weight, p_tot_profit + r_nxt.item_profit);
Timer_Set.Increment_Time (g_timer,  'Do_One_Level/Do_One_Level');

END LOOP;

IF l_is_leaf THEN

IF p_tot_profit > l_best_profit THEN

g_solution_list := SYS.ODCINumberList (p_tree_index);
l_best_profit := p_tot_profit;

ELSIF p_tot_profit = l_best_profit THEN

g_solution_list.Extend;
g_solution_list (g_solution_list.COUNT) := p_tree_index;

END IF;

END IF;
Timer_Set.Increment_Time (g_timer,  'Do_One_Level/leaves');

END Do_One_Level;

BEGIN

FOR r_itm IN (SELECT id, item_weight, item_profit FROM items) LOOP

Timer_Set.Increment_Time (g_timer,  'Root fetches');
Do_One_Level (Add_Node (r_itm.id, 0, r_itm.item_weight), r_itm.id, r_itm.item_weight, r_itm.item_profit);

END LOOP;

FOR i IN 1..g_solution_list.COUNT LOOP

j := g_solution_list(i);
l_sol := NULL;
l_weight := g_item_tree (j).weight_to_node;
WHILE j != 0 LOOP

l_sol := l_sol || g_item_tree (j).item_id || ', ';
j :=  g_item_tree (j).item_index_parent;

END LOOP;
l_sol_cnt := l_sol_cnt + 1;
PIPE ROW ('Solution ' || l_sol_cnt || ' (profit ' || l_best_profit || ', weight ' || l_weight || ') : ' || RTrim (l_sol, ', '));

END LOOP;

Timer_Set.Increment_Time (g_timer,  'Write output');
Write_Log ('Profit ' || l_best_profit || ' has ' || l_sol_cnt || ' solutions...');
Timer_Set.Write_Times (g_timer);

EXCEPTION
WHEN OTHERS THEN
Timer_Set.Write_Times (g_timer);
RAISE;

END Best_Fits;

END Packing_PLF;


Performance Analysis

It will be interesting to see how the solution methods perform as problem size varies, and we will use my own performance benchmarking framework to do this. As the framework is designed to compare performance of SQL queries, I have converted the PL/SQL solution to operate as a pipelined function, and thus be callable from SQL, as noted above. I included a version of the SQL solution, with the leaf filtering mentioned above, XKPLV - this was based on XKEEP, with filtering as in the OTN thread.

Test Data Sets

Test data sets are generated as follows, in terms of two integer parameters, w and d:

Insert w items with sequential ids, and random weights and profits in the ranges 0-d and 0-1000, respectively, via Oracle's function DBMS_Random.Value. The maximum weight is fixed at 100.

Test Results

The embedded Excel file below summarises the results obtained over a grid of data points, with w in (12, 14, 16, 18, 20) and d in (16, 18, 20).

Notes

• The two versions of the non-leaf SQL solution take pretty much the same time to execute, and are faster than the others
• The leaf version of the SQL solution (XKPLV) is slower than the non-leaf versions, and becomes much worse in terms of elapsed time for the more difficult problems; the step-change in performance can be seen to be due to its greater memory usage, which spills to disk above a certain level
• The pipelined function solution is significantly slower than the other solutions in terms of CPU time, and elapsed time, except in the case of the leaf SQL solution when that solution's memory usage spills to disk. The pipelined function continues to use more memory as the problem difficulty rises, until all available memory is consumed, when it throws an error (but this case is not included in the result set above)

Conclusions

Oracle's v11.2 implementation of the Ansii SQL feature recursive subquery factoring provides a simple solution for the knapsack problem, that cannot be achieved with Oracle's older Connect By syntax alone.

The method has been described here in its exact form that is viable only for small problems; however, I have later extended the approach to allow for search limiting and have shown this to be viable for larger problems.