# SQL for the Balanced Number Partitioning Problem

I noticed a post on AskTom recently that referred to an SQL solution to a version of the so-called Bin Fitting problem, where even distribution is the aim. The solution, How do I solve a Bin Fitting problem in an SQL statement?, uses Oracle’s Model clause, and, as the poster of the link observed, has the drawback that the number of bins is embedded in the query structure. I thought it might be interesting to find solutions without that drawback, so that the number of bins could be passed to the query as a bind variable. I came up with three solutions using different techniques, starting here.

An interesting article in American Scientist, The Easiest Hard Problem, notes that the problem is NP-complete, or certifiably hard, but that simple greedy heuristics often produce a good solution, including one used by schoolboys to pick football teams. The article uses the more descriptive term for the problem of balanced number partitioning, and notes some practical applications. The Model clause solution implements a multiple-bin version of the main Greedy Algorithm, while my non-Model SQL solutions implement variants of it that allow other techniques to be used, one of which is very simple and fast: this implements the team picking heuristic for multiple teams.

Another poster, Stew Ashton, suggested a simple change to my Model solution that improved performance, and I use this modified version here. He also suggested that using PL/SQL might be faster, and I have added my own simple PL/SQL implementation of the Greedy Algorithm, as well as a second version of the recursive subquery factoring solution that performs better than the first.

This article explains the solutions, considers two simple examples to illustrate them, and reports on performance testing across dimensions of number of items and number of bins. These show that the solutions exhibit either linear or quadratic variation in execution time with number of items, and some methods are sensitive to the number of bins while others are not.

After I had posted my solutions on the AskTom thread, I came across a thread on OTN, need help to resolve this issue, that requested a solution to a form of bin fitting problem where the bins have fixed capacity and the number of bins required must be determined. I realised that my solutions could easily be extended to add that feature, and posted extended versions of two of the solutions there. I have added a section here for this.

Updated, 5 June 2013: added Model and RSF diagrams

Update, 18 November 2017: I have now put scripts for setting up data and running the queries in a new schema onto my GitHub project: Brendan’s repo for interesting SQL. Note that I have not included the minor changes needed for the extended problem where finding the number of bins is part of the problem.

Greedy Algorithm Variants

Say there are N bins and M items.

Greedy Algorithm (GDY)
Set bin sizes zero
Loop over items in descending order of size

• Add item to current smallest bin
• Calculate new bin size

End Loop

Greedy Algorithm with Batched Rebalancing (GBR)
Set bin sizes zero
Loop over items in descending order of size in batches of N items

• Assign batch to N bins, with bins in ascending order of size
• Calculate new bin sizes

End Loop

Greedy Algorithm with No Rebalancing – or, Team Picking Algorithm (TPA)
Assign items to bins cyclically by bin sequence in descending order of item size

Two Examples

Example: Four Items

Here we see that the Greedy Algorithm finds the perfect solution, with no difference in bin size, but the two variants have a difference of two.

Example: Six Items

Here we see that none of the algorithms finds the perfect solution. Both the standard Greedy Algorithm and its batched variant give a difference of two, while the variant without rebalancing gives a difference of four.

SQL Solutions

Original Model for GDY
See the link above for the SQL for the problem with three bins only.

The author has two measures for each bin and implements the GDY algorithm using CASE expressions and aggregation within the rules. The idea is to iterate over the items in descending order of size, setting the item bin to the bin with current smallest value. I use the word ‘bin’ for his ‘bucket’. Some notes:

• Dimension by row number, ordered by item value
• Add measures for the iteration, it, and number of iterations required, counter
• Add measures for the bin name, bucket_name, and current minimum bin value, min_tmp (only first entry used)
• Add measures for each item bin value, bucket_1-3, being the item value if it’s in that bin, else zero
• Add measures for each bin running sum, pbucket_1-3, being the current value of each bin (only first two entries used)
• The current minimum bin value, bin_tmp[1] is computed as the least of the running sums
• The current item bin value is set to the item value for the bin whose value matches the minimum just computed, and null for the others
• The current bin name is set similarly to be the bin matching the minimum
• The new running sums are computed for each bin

Brendan’s Generic Model for GDY

```SELECT item_name, bin, item_value, Max (bin_value) OVER (PARTITION BY bin) bin_value
FROM (
SELECT * FROM items
MODEL
DIMENSION BY (Row_Number() OVER (ORDER BY item_value DESC) rn)
MEASURES (item_name,
item_value,
Row_Number() OVER (ORDER BY item_value DESC) bin,
item_value bin_value,
Row_Number() OVER (ORDER BY item_value DESC) rn_m,
0 min_bin,
Count(*) OVER () - :N_BINS - 1 n_iters
)
RULES ITERATE(100000) UNTIL (ITERATION_NUMBER >= n_iters[1]) (
min_bin[1] = Min(rn_m) KEEP (DENSE_RANK FIRST ORDER BY bin_value)[rn <= :N_BINS],
bin[ITERATION_NUMBER + :N_BINS + 1] = min_bin[1],
bin_value[min_bin[1]] = bin_value[CV()] + Nvl (item_value[ITERATION_NUMBER + :N_BINS + 1], 0)
)
)
WHERE item_name IS NOT NULL
ORDER BY item_value DESC
```

My Model solution works for any number of bins, passing the number of bins as a bind variable. The key idea here is to use values in the first N rows of a generic bin value measure to store all the running bin values, rather than as individual measures. I have included two modifications suggested by Stew in the AskTom thread.

• Dimension by row number, ordered by item value
• Initialise a bin measure to the row number (the first N items will remain fixed)
• Initialise a bin value measure to item value (only first N entries used)
• Add the row number as a measure, rn_m, in addition to a dimension, for referencing purposes
• Add a min_bin measure for current minimum bin index (first entry only)
• Add a measure for the number of iterations required, n_iters
• The first N items are correctly binned in the measure initialisation
• Set the minimum bin index using analytic Min function with KEEP clause over the first N rows of bin value
• Set the bin for the current item to this index
• Update the bin value for the corresponding bin only

Recursive Subquery Factor for GBR

```WITH bins AS (
SELECT LEVEL bin, :N_BINS n_bins FROM DUAL CONNECT BY LEVEL <= :N_BINS
), items_desc AS (
SELECT item_name, item_value, Row_Number () OVER (ORDER BY item_value DESC) rn
FROM items
), rsf (bin, item_name, item_value, bin_value, lev, bin_rank, n_bins) AS (
SELECT b.bin,
i.item_name,
i.item_value,
i.item_value,
1,
b.n_bins - i.rn + 1,
b.n_bins
FROM bins b
JOIN items_desc i
ON i.rn = b.bin
UNION ALL
SELECT r.bin,
i.item_name,
i.item_value,
r.bin_value + i.item_value,
r.lev + 1,
Row_Number () OVER (ORDER BY r.bin_value + i.item_value),
r.n_bins
FROM rsf r
JOIN items_desc i
ON i.rn = r.bin_rank + r.lev * r.n_bins
)
SELECT r.item_name,
r.bin, r.item_value, r.bin_value
FROM rsf r
ORDER BY item_value DESC```

The idea here is to use recursive subquery factors to iterate through the items in batches of N items, assigning each item to a bin according to the rank of the bin on the previous iteration.

• Initial subquery factors form record sets for the bins and for the items with their ranks in descending order of value
• The anchor branch assign bins to the first N items, assigning the item values to a bin value field, and setting the bin rank in ascending order of this bin value
• The recursive branch joins the batch of items to the record in the previous batch whose bin rank matches that of the item in the reverse sense (so largest item goes to smallest bin etc.)
• The analytic Row_Number function computes the updated bin ranks, and the bin values are updated by simple addition

Recursive Subquery Factor for GBR with Temporary Table
Create Table and Index

```DROP TABLE items_desc_temp
/
CREATE GLOBAL TEMPORARY TABLE items_desc_temp (
item_name  VARCHAR2(30) NOT NULL,
item_value NUMBER(8) NOT NULL,
rn         NUMBER
)
ON COMMIT DELETE ROWS
/
CREATE INDEX items_desc_temp_N1 ON items_desc_temp (rn)
/```

Insert into Temporary Table

```INSERT INTO items_desc_temp
SELECT item_name, item_value, Row_Number () OVER (ORDER BY item_value DESC) rn
FROM items;```

RSF Query with Temporary Table

```WITH bins AS (
SELECT LEVEL bin, :N_BINS n_bins FROM DUAL CONNECT BY LEVEL <= :N_BINS
), rsf (bin, item_name, item_value, bin_value, lev, bin_rank, n_bins) AS (
SELECT b.bin,
i.item_name,
i.item_value,
i.item_value,
1,
b.n_bins - i.rn + 1,
b.n_bins
FROM bins b
JOIN items_desc_temp i
ON i.rn = b.bin
UNION ALL
SELECT r.bin,
i.item_name,
i.item_value,
r.bin_value + i.item_value,
r.lev + 1,
Row_Number () OVER (ORDER BY r.bin_value + i.item_value),
r.n_bins
FROM rsf r
JOIN items_desc_temp i
ON i.rn = r.bin_rank + r.lev * r.n_bins
)
SELECT item_name, bin, item_value, bin_value
FROM rsf
ORDER BY item_value DESC```

The idea here is that in the initial RSF query a subquery factor of items was joined on a calculated field, so the whole record set had to be read, and performance could be improved by putting that initial record set into an indexed temporary table ahead of the main query. We'll see in the performance testing section that this changes quadratic variation with problem size into linear variation.

Plain Old SQL Solution for TPA

```WITH items_desc AS (
SELECT item_name, item_value,
Mod (Row_Number () OVER (ORDER BY item_value DESC), :N_BINS) + 1 bin
FROM items
)
SELECT item_name, bin, item_value, Sum (item_value) OVER (PARTITION BY bin) bin_total
FROM items_desc
ORDER BY item_value DESC
```

The idea here is that the TPA algorithm can be implemented in simple SQL using analyic functions.

• The subquery factor assigns the bins by taking the item rank in descending order of value and applying the modulo (N) function
• The main query returns the bin totals in addition by analytic summing by bin

Pipelined Function for GDY
Package

```CREATE OR REPLACE PACKAGE Bin_Fit AS

TYPE bin_fit_rec_type IS RECORD (item_name VARCHAR2(100), item_value NUMBER, bin NUMBER);
TYPE bin_fit_list_type IS VARRAY(1000) OF bin_fit_rec_type;

TYPE bin_fit_cur_rec_type IS RECORD (item_name VARCHAR2(100), item_value NUMBER);
TYPE bin_fit_cur_type IS REF CURSOR RETURN bin_fit_cur_rec_type;

FUNCTION Items_Binned (p_items_cur bin_fit_cur_type, p_n_bins PLS_INTEGER) RETURN bin_fit_list_type PIPELINED;

END Bin_Fit;
/
CREATE OR REPLACE PACKAGE BODY Bin_Fit AS

c_big_value                 CONSTANT NUMBER := 100000000;
TYPE bin_fit_cur_list_type  IS VARRAY(100) OF bin_fit_cur_rec_type;

FUNCTION Items_Binned (p_items_cur bin_fit_cur_type, p_n_bins PLS_INTEGER) RETURN bin_fit_list_type PIPELINED IS

l_min_bin              PLS_INTEGER := 1;
l_min_bin_val             NUMBER;
l_bins                    SYS.ODCINumberList := SYS.ODCINumberList();
l_bin_fit_cur_rec         bin_fit_cur_rec_type;
l_bin_fit_rec             bin_fit_rec_type;
l_bin_fit_cur_list        bin_fit_cur_list_type;

BEGIN

l_bins.Extend (p_n_bins);
FOR i IN 1..p_n_bins LOOP
l_bins(i) := 0;
END LOOP;

LOOP

FETCH p_items_cur BULK COLLECT INTO l_bin_fit_cur_list LIMIT 100;
EXIT WHEN l_bin_fit_cur_list.COUNT = 0;

FOR j IN 1..l_bin_fit_cur_list.COUNT LOOP

l_bin_fit_rec.item_name := l_bin_fit_cur_list(j).item_name;
l_bin_fit_rec.item_value := l_bin_fit_cur_list(j).item_value;
l_bin_fit_rec.bin := l_min_bin;

PIPE ROW (l_bin_fit_rec);
l_bins(l_min_bin) := l_bins(l_min_bin) + l_bin_fit_cur_list(j).item_value;

l_min_bin_val := c_big_value;
FOR i IN 1..p_n_bins LOOP

IF l_bins(i) < l_min_bin_val THEN
l_min_bin := i;
l_min_bin_val := l_bins(i);
END IF;

END LOOP;

END LOOP;

END LOOP;

END Items_Binned;
```

SQL Query

```SELECT item_name, bin, item_value, Sum (item_value) OVER (PARTITION BY bin) bin_value
FROM TABLE (Bin_Fit.Items_Binned (
CURSOR (SELECT item_name, item_value FROM items ORDER BY item_value DESC),
:N_BINS))
ORDER BY item_value DESC
```

The idea here is that procedural algorithms can often be implemented more efficiently in PL/SQL than in SQL.

• The first parameter to the function is a strongly-typed reference cursor
• The SQL call passes in a SELECT statement wrapped in the CURSOR keyword, so the function can be used for any set of records that returns name and numeric value pairs
• The item records are fetched in batches of 100 using the LIMIT clause to improves efficiency

Performance Testing
I tested performance of the various queries using my own benchmarking framework across grids of data points, with two data sets to split the queries into two sets based on performance.

I presented on this approach to benchmarking SQL at the Ireland Oracle User Group conference in March 2017, Dimensional Performance Benchmarking of SQL – IOUG Presentation.

Query Modifications for Performance Testing

• The RSF query with staging table was run within a pipelined function in order to easily include the insert in the timings
• A system context was used to pass the bind variables as the framework runs the queries from PL/SQL, not from SQL*Plus
• I found that calculating the bin values using analytic sums, as in the code above, affected performance, so I removed this for clarity of results, outputting only item name, value and bin

Test Data Sets
For a given depth parameter, d, random numbers were inserted within the range 0-d for d-1 records. The insert was:

``` INSERT INTO items
SELECT 'item-' || n, DBMS_Random.Value (0, p_point_deep) FROM
(SELECT LEVEL n FROM DUAL CONNECT BY LEVEL < p_point_deep);```

The number of bins was passed as a width parameter, but note that the original, linked Model solution, MODO, hard-codes the number of bins to 3.

Test Results

Data Set 1 - Small
This was used for the following queries:

• MODO - Original Model for GDY
• MODB - Brendan's Generic Model for GDY
• RSFQ - Recursive Subquery Factor for GBR
``` Depth         W3         W3         W3
Run Type=MODO
D1000       1.03       1.77       1.05
D2000       3.98       6.46       5.38
D4000      15.79       20.7      25.58
D8000      63.18      88.75      92.27
D16000      364.2     347.74     351.99
Run Type=MODB
Depth         W3         W6        W12
D1000        .27        .42        .27
D2000          1       1.58       1.59
D4000       3.86        3.8       6.19
D8000      23.26      24.57      17.19
D16000      82.29      92.04      96.02
Run Type=RSFQ
D1000       3.24       3.17       1.53
D2000       8.58       9.68       8.02
D4000      25.65      24.07      23.17
D8000      111.3     108.25      98.33
D16000     471.17     407.65     399.99
```

The results show:

• Quadratic variation of CPU time with number of items
• Little variation of CPU time with number of bins, although RSFQ seems to show some decline
• RSFQ is slightly slower than MODO, while my version of Model, MODB is about 4 times faster than MODO

Data Set 2 - Large
This was used for the following queries:

• RSFT - Recursive Subquery Factor for GBR with Temporary Table
• POSS - Plain Old SQL Solution for TPA
• PLFN - Pipelined Function for GDY

This table gives the CPU times in seconds across the data set:

```  Depth       W100      W1000     W10000
Run Type=PLFN
D20000        .31       1.92      19.25
D40000        .65       3.87      55.78
D80000       1.28       7.72      92.83
D160000       2.67      16.59     214.96
D320000       5.29      38.68      418.7
D640000      11.61      84.57      823.9
Run Type=POSS
D20000        .09        .13        .13
D40000        .18        .21        .18
D80000        .27        .36         .6
D160000        .74       1.07        .83
D320000       1.36       1.58       1.58
D640000       3.13       3.97       4.04
Run Type=RSFT
D20000        .78        .78        .84
D40000       1.41       1.54        1.7
D80000       3.02       3.39       4.88
D160000       6.11       9.56       8.42
D320000      13.05      18.93      20.84
D640000      41.62      40.98      41.09
```

The results show:

• Linear variation of CPU time with number of items
• Little variation of CPU time with number of bins for POSS and RSFT, but roughly linear variation for PLFN
• These linear methods are much faster than the earlier quadratic ones for larger numbers of items
• Its approximate proportionality of time to number of bins means that, while PLFN is faster than RSFT for small number of bins, it becomes slower from around 50 bins for our problem
• The proportionality to number of bins for PLFN presumably arises from the step to find the bin of minimum value
• The lack of proportionality to number of bins for RSFT may appear surprising since it performs a sort of the bins iteratively: However, while the work for this sort is likely to be proportional to the number of bins, the number of iterations is inversely proportional and thus cancels out the variation

Solution Quality

The methods reported above implement three underlying algorithms, none of which guarantees an optimal solution. In order to get an idea of how the quality compares, I created new versions of the second set of queries using analytic functions to output the difference between minimum and maximum bin values, with percentage of the maximum also output. I ran these on the same grid, and report below the results for the four corners.

```Method:			PLFN		RSFT		POSS
Point:	W100/D20000
Diff/%:			72/.004%	72/.004%	19,825/1%
Point:	W100/D640000
Diff/%:			60/.000003%	60/.000003%	633499/.03%
Point:	W10000/D20000
Diff/%:			189/.9%		180/.9%		19,995/67%
Point:	W10000/D640000
Diff/%:			695/.003%	695/.003%	639,933/3%```

The results indicate that GDY (Greedy Algorithm) and GBR (Greedy Algorithm with Batched Rebalancing) generally give very similar quality results, while TPA (Team Picking Algorithm) tends to be quite a lot worse.

Extended Problem: Finding the Number of Bins Required

An important extension to the problem is when the bins have fixed capacity, and it is desired to find the minimum number of bins, then spread the items evenly between them. As mentioned at the start, I posted extensions to two of my solutions on an OTN thread, and I reproduce them here. It turns out to be quite easy to make the extension. The remainder of this section is just lifted from my OTN post and refers to the table of the original poster.

Start OTN Extract
So how do we determine the number of bins? The total quantity divided by bin capacity, rounded up, gives a lower bound on the number of bins needed. The actual number required may be larger, but mostly it will be within a very small range from the lower bound, I believe (I suspect it will nearly always be the lower bound). A good practical solution, therefore, would be to compute the solutions for a base number, plus one or more increments, and this can be done with negligible extra work (although Model might be an exception, I haven't tried it). Then the bin totals can be computed, and the first solution that meets the constraints can be used. I took two bin sets here.

SQL POS

```WITH items AS (
SELECT sl_pm_code item_name, sl_wt item_amt, sl_qty item_qty,
Ceil (Sum(sl_qty) OVER () / :MAX_QTY) n_bins
FROM ow_ship_det
), items_desc AS (
SELECT item_name, item_amt, item_qty, n_bins,
Mod (Row_Number () OVER (ORDER BY item_qty DESC), n_bins) bin_1,
Mod (Row_Number () OVER (ORDER BY item_qty DESC), n_bins + 1) bin_2
FROM items
)
SELECT item_name, item_amt, item_qty,
CASE bin_1 WHEN 0 THEN n_bins ELSE bin_1 END bin_1,
CASE bin_2 WHEN 0 THEN n_bins + 1 ELSE bin_2 END bin_2,
Sum (item_amt) OVER (PARTITION BY bin_1) bin_1_amt,
Sum (item_qty) OVER (PARTITION BY bin_1) bin_1_qty,
Sum (item_amt) OVER (PARTITION BY bin_2) bin_2_amt,
Sum (item_qty) OVER (PARTITION BY bin_2) bin_2_qty
FROM items_desc
ORDER BY item_qty DESC, bin_1, bin_2```

SQL Pipelined

```SELECT osd.sl_pm_code item_name, osd.sl_wt item_amt, osd.sl_qty item_qty,
tab.bin_1, tab.bin_2,
Sum (osd.sl_wt) OVER (PARTITION BY tab.bin_1) bin_1_amt,
Sum (osd.sl_qty) OVER (PARTITION BY tab.bin_1) bin_1_qty,
Sum (osd.sl_wt) OVER (PARTITION BY tab.bin_2) bin_2_amt,
Sum (osd.sl_qty) OVER (PARTITION BY tab.bin_2) bin_2_qty
FROM ow_ship_det osd
JOIN TABLE (Bin_Even.Items_Binned (
CURSOR (SELECT sl_pm_code item_name, sl_qty item_value,
Sum(sl_qty) OVER () item_total
FROM ow_ship_det
ORDER BY sl_qty DESC, sl_wt DESC),
:MAX_QTY)) tab
ON tab.item_name = osd.sl_pm_code
ORDER BY osd.sl_qty DESC, tab.bin_1```

Pipelined Function

```CREATE OR REPLACE PACKAGE Bin_Even AS

TYPE bin_even_rec_type IS RECORD (item_name VARCHAR2(100), item_value NUMBER, bin_1 NUMBER, bin_2 NUMBER);
TYPE bin_even_list_type IS VARRAY(1000) OF bin_even_rec_type;

TYPE bin_even_cur_rec_type IS RECORD (item_name VARCHAR2(100), item_value NUMBER, item_total NUMBER);
TYPE bin_even_cur_type IS REF CURSOR RETURN bin_even_cur_rec_type;

FUNCTION Items_Binned (p_items_cur bin_even_cur_type, p_bin_max NUMBER) RETURN bin_even_list_type PIPELINED;

END Bin_Even;
/
SHO ERR
CREATE OR REPLACE PACKAGE BODY Bin_Even AS

c_big_value                 CONSTANT NUMBER := 100000000;
c_n_bin_sets                CONSTANT NUMBER := 2;

TYPE bin_even_cur_list_type IS VARRAY(100) OF bin_even_cur_rec_type;
TYPE num_lol_list_type      IS VARRAY(100) OF SYS.ODCINumberList;

FUNCTION Items_Binned (p_items_cur bin_even_cur_type, p_bin_max NUMBER) RETURN bin_even_list_type PIPELINED IS

l_min_bin                 SYS.ODCINumberList := SYS.ODCINumberList (1, 1);
l_min_bin_val             SYS.ODCINumberList := SYS.ODCINumberList (c_big_value, c_big_value);
l_bins                    num_lol_list_type := num_lol_list_type (SYS.ODCINumberList(), SYS.ODCINumberList());

l_bin_even_cur_rec        bin_even_cur_rec_type;
l_bin_even_rec            bin_even_rec_type;
l_bin_even_cur_list       bin_even_cur_list_type;

l_n_bins                  PLS_INTEGER;
l_n_bins_base             PLS_INTEGER;
l_is_first_fetch          BOOLEAN := TRUE;

BEGIN

LOOP

FETCH p_items_cur BULK COLLECT INTO l_bin_even_cur_list LIMIT 100;
EXIT WHEN l_Bin_Even_cur_list.COUNT = 0;
IF l_is_first_fetch THEN

l_n_bins_base := Ceil (l_Bin_Even_cur_list(1).item_total / p_bin_max) - 1;

l_is_first_fetch := FALSE;

l_n_bins := l_n_bins_base;
FOR i IN 1..c_n_bin_sets LOOP

l_n_bins := l_n_bins + 1;
l_bins(i).Extend (l_n_bins);
FOR k IN 1..l_n_bins LOOP
l_bins(i)(k) := 0;
END LOOP;

END LOOP;

END IF;

FOR j IN 1..l_Bin_Even_cur_list.COUNT LOOP

l_bin_even_rec.item_name := l_bin_even_cur_list(j).item_name;
l_bin_even_rec.item_value := l_bin_even_cur_list(j).item_value;
l_bin_even_rec.bin_1 := l_min_bin(1);
l_bin_even_rec.bin_2 := l_min_bin(2);

PIPE ROW (l_bin_even_rec);

l_n_bins := l_n_bins_base;
FOR i IN 1..c_n_bin_sets LOOP
l_n_bins := l_n_bins + 1;
l_bins(i)(l_min_bin(i)) := l_bins(i)(l_min_bin(i)) + l_Bin_Even_cur_list(j).item_value;

l_min_bin_val(i) := c_big_value;
FOR k IN 1..l_n_bins LOOP

IF l_bins(i)(k) < l_min_bin_val(i) THEN
l_min_bin(i) := k;
l_min_bin_val(i) := l_bins(i)(k);
END IF;

END LOOP;

END LOOP;

END LOOP;

END LOOP;

END Items_Binned;

END Bin_Even;```

Output POS
Note BIN_1 means bin set 1, which turns out to have 4 bins, while bin set 2 then necessarily has 5.

```ITEM_NAME         ITEM_AMT   ITEM_QTY      BIN_1      BIN_2  BIN_1_AMT  BIN_1_QTY  BIN_2_AMT  BIN_2_QTY
--------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
1239606-1080          4024        266          1          1      25562        995      17482        827
1239606-1045          1880        192          2          2      19394        886      14568        732
1239606-1044          1567        160          3          3      18115        835      14097        688
1239606-1081          2118        140          4          4      18988        793      17130        657
1239606-2094          5741         96          1          5      25562        995      18782        605
...
1239606-2107            80          3          4          2      18988        793      14568        732
1239606-2084           122          3          4          3      18988        793      14097        688
1239606-2110           210          2          2          3      19394        886      14097        688
1239606-4022           212          2          3          4      18115        835      17130        657
1239606-4021           212          2          4          5      18988        793      18782        605```

Output Pipelined

```ITEM_NAME         ITEM_AMT   ITEM_QTY      BIN_1      BIN_2  BIN_1_AMT  BIN_1_QTY  BIN_2_AMT  BIN_2_QTY
--------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
1239606-1080          4024        266          1          1      20627        878      15805        703
1239606-1045          1880        192          2          2      18220        877      16176        703
1239606-1044          1567        160          3          3      20425        878      15651        701
1239606-1081          2118        140          4          4      22787        876      14797        701
1239606-2094          5741         96          4          5      22787        876      19630        701
...
1239606-2089            80          3          4          1      22787        876      15805        703
1239606-2112           141          3          4          2      22787        876      16176        703
1239606-4022           212          2          1          1      20627        878      15805        703
1239606-4021           212          2          2          1      18220        877      15805        703
1239606-2110           210          2          3          2      20425        878      16176        703```

End OTN Extract

Conclusions

• Various solutions for the balanced number partitioning problem have been presented, using Oracle's Model clause, Recursive Subquery Factoring, Pipelined Functions and simple SQL
• The performance characteristics of these solutions have been tested across a range of data sets
• As is often the case, the best solution depends on the shape and size of the data set
• A simple extension has been shown to allow determining the number of bins required in the bin-fitting interpretation of the problem
• Replacing a WITH clause with a staging table can be a useful technique to allow indexed scans

Get the code here: Brendan's repo for interesting SQL

# PL/SQL Profiling 1 – Overview

This article describes at overview level the use of three PL/SQL profiling methods on two example program structures. The examples are designed to illustrate profiler behaviour over as many different scenarios as possible, while keeping the examples as simple as possible. The first two profilers are Oracle’s hierarchical and flat profiling tools, while the third is the author’s own custom code timing package, Timer_Set. It’s based on an article published in March 2013 on the hierarchical profiler and updated shortly thereafter with the inclusion of Oracle’s older flat profiler and of custom code timing. In June 2020 installation and source code were put onto GitHub, and the article was restructured into an overview article with the detail on the three profiling methods as separate articles. This is the overview and the links to the other three detailed articles are:

All source code, including installation scripts, and short screen recordings, is available on GitHub:
GitHub: Oracle PL/SQL Profiling

## Scenarios

This phrase is often attributed to Albert Einstein, although the attribution is apparently questionable: Everything Should Be Made as Simple as Possible, But Not Simpler. In any case it’s not a bad approach to follow in testing, whether in ‘closed’ unit testing, where there are known expected results, or in ‘open’ exploratory tesing where we are investigating application behaviour, as here. I discussed the issue of test scenario coverage in a 2018 presentation: Database API Viewed As A Mathematical Function: Insights into Testing – OUG Ireland Conference 2018.

At the time of the original article in 2013, I looked at Oracle’s hierarchical profiler tool with a view to using it in an upcoming project. In order to understand the tool properly, I felt it would be a good idea to start by using it to profile a test program that would be as simple as possible while covering as wide a range of scenarios as possible. I then added a second simple program to cover some additional scenarios, and also tested the use of Oracle’s flat profiler, and of custom code timing using my own Timer_Set package.

This first article explains the scenarios covered by the two example programs, briefly summarises the results of each method on the second and simpler example, and finishes with a comparison of the features of the three methods. Fuller discussions of each profiling method, with application to both example programs can be found in the specific articles linked above.

The test programs consist of two driving scripts that have one or more PL/SQL blocks with calls to various system and custom packaged procedures and functions.

### Example 1: General

The test program for example 1 covers the following scenarios:

• Multiple root calls
• Recursive procedure calls (procedure calling itself: R_Calls_R)
• Mutually recursive procedure calls (procedures call each other: A_Calls_B and B_Calls_A)
• Program unit called by multiple program units (child with multiple parents: Database Function)
• Procedure ‘inlined’ within PL/SQL (Rest_a_While)
• Type body code (Object Constructor)
• Static SQL from Sqlplus (SELECT 1)
• Static SQL within PL/SQL (SELECT 2)
• Dynamic SQL within PL/SQL (SELECT 3)
• CPU-consuming sleep (Rest_a_While)

### Example 2: Sleep

The test program for example 2 covers the following scenarios:

• External system program unit (DBMS_Lock.Sleep in SYS schema)
• External custom program unit (Utils.Sleep in LIB schema)
• Sleep that uses no CPU time, only elapsed time (DBMS_Lock.Sleep)
• Sleep that uses CPU time (Utils.Sleep)
• Trigger code (Before Insert trigger SLEEP_BI)
• Two calls from the same parent (DBMS_Lock.Sleep from anonymous block twice)
• Program unit as root call and as child call (DBMS_Lock.Sleep)

## Profiling Overview

In this section I show the data model and a summary of the results of each method on the second and simpler example. For fuller discussion with both examples see the specific articles for each method.

### Hierarchical Profiler Overview

PL/SQL Profiling 2: Hierarchical Profiler

#### Network Diagram for Example 2: Sleep

The extended network diagram with function names and times included shows 9 seconds of time used by the SLEEP function in two root-level calls and 1 second in a third call as the child of the custom Utils Sleep function.

##### Network Query Output for Example 2: Sleep
```Function tree                   Sy Owner Module           Inst.  Subtree MicroS Function MicroS Calls Row
------------------------------ --- ----- ---------------- ------ -------------- --------------- ----- ---
SLEEP                            8 SYS   DBMS_LOCK        1 of 2        8998195         8998195     2   1
__static_sql_exec_line6         10                                      2005266            5660     1   2
__plsql_vm                     1                                      1999606               4     1   3
SLEEP_BI                     3 APP   SLEEP_BI                       1999602             327     1   4
SLEEP                      4 LIB   UTILS                          1999268          999837     1   5
SLEEP                    8 SYS   DBMS_LOCK        2 of 2         999431          999431     1   6
__pkg_init                 6 LIB   UTILS                                5               5     1   7
__pkg_init                 5 LIB   UTILS                                2               2     1   8
STOP_PROFILING                   2 APP   HPROF_UTILS                         87              87     1   9
STOP_PROFILING                 7 SYS   DBMS_HPROF                           0               0     1  10
__static_sql_exec_line700       11 SYS   DBMS_HPROF                          77              77     1  11
__pkg_init                       9 SYS   DBMS_LOCK                            3               3     1  12

12 rows selected.
```

### Flat Profiler Overview

PL/SQL Profiling 3: Flat Profiler

#### Flat Profiler: Results Extract

The records produced in the functions table, PLSQL_PROFILER_DATA, are listed below in order of unit name, then unit number and line number.

```Seconds  Microsecs   Min S   Max S    Calls Unit                 Unit# Type            Line# Line Text
------- ---------- ------- ------- -------- -------------------- ----- --------------- ----- ------------------------------------------------------------------
0.000          0   0.000   0.000        0 <anonymous>              1 ANONYMOUS BLOCK     1
0.000          2   0.000   0.000        0 <anonymous>              1 ANONYMOUS BLOCK     5
0.000         23   0.000   0.000        1 <anonymous>              1 ANONYMOUS BLOCK     9
0.000         10   0.000   0.000        1 <anonymous>              1 ANONYMOUS BLOCK    11
0.000        199   0.000   0.000        1 <anonymous>              1 ANONYMOUS BLOCK    13
0.000          3   0.000   0.000        1 <anonymous>              1 ANONYMOUS BLOCK    15
0.000          5   0.000   0.000        1 <anonymous>              1 ANONYMOUS BLOCK    17
0.000          0   0.000   0.000        0 <anonymous>              1 ANONYMOUS BLOCK    19
0.000          2   0.000   0.000        0 SLEEP_BI                 2 TRIGGER             1 TRIGGER sleep_bi
0.000         20   0.000   0.000        1 SLEEP_BI                 2 TRIGGER             2 BEFORE INSERT
0.000          1   0.000   0.000        1 SLEEP_BI                 2 TRIGGER             4 FOR EACH ROW

11 rows selected.
```

### Timer Set Overview

PL/SQL Profiling 4: Custom Code Timing

#### Timer Set: Results Extract

```Timer Set: Profiling DBMS_Lock.Sleep, Constructed at 27 Jun 2020 07:53:00, written at 07:53:11
==============================================================================================
Timer                                       Elapsed         CPU       Calls       Ela/Call       CPU/Call
---------------------------------------  ----------  ----------  ----------  -------------  -------------
3 second sleep                                 3.00        0.00           1        3.00000        0.00000
INSERT INTO trigger_tab VALUES (2, 0.5)        2.00        1.00           1        1.99900        1.00000
6 second sleep                                 6.00        0.00           1        6.00000        0.00000
(Other)                                        0.00        0.00           1        0.00000        0.00000
---------------------------------------  ----------  ----------  ----------  -------------  -------------
Total                                         11.00        1.00           4        2.74975        0.25000
---------------------------------------  ----------  ----------  ----------  -------------  -------------
[Timer timed (per call in ms): Elapsed: 0.00980, CPU: 0.00980]
```

## Conclusion

Oracle’s hierarchical profiler and flat profiler offer different ways of profiling elapsed time and call usage for PL/SQL programs. After initial setup they perform profiling of code automatically between start and stop API calls, without the need to alter the code being profiled. The level of profiling and organisation of data collection are not configurable. They are intended for one-off use or occasional trouble-shooting, rather than as ongoing instrumentation.

Custom code timing requires insertion of timing code within the profiled code, but offers more flexibility in terms of detail gathered, and also captures CPU time usage. Use of a centralized API such as Timer_Set helps to minimize the code required. It can be used both for trouble-shooting and for ongoing production instrumentation.

# Master-Detail Transaction Matching in SQL (MDTM1)

This article is the first in a sequence of three dealing with a very general class of problems in SQL, and exploring various techniques to find efficient solutions. In this first article, the problem is outlined and is divided into two subproblems, of which the first is solved here in several variant SQL statements with performance analysis. The second article, Holographic Set Matching in SQL, takes the most efficient method and applies two new techniques to further improve performance. The third article, Master-Detail Transaction Reconciliation in SQL (MDTM3), adds a sequence of subquery factors to the best solution for the first subproblem to achieve an efficient solution to the overall problem within a single SQL statement.

The General Problem

We consider a transaction with a two-level structure consisting of a header (or master) and lines (or details) linked to the header, and the problem is to pair off transactions by matching, or contra-matching, subsets of the fields at both header and line level. This kind of problem can arise in the context of reconciliation of debit and credit transactions where both transactions are in the same system but are entered separately by two different groups and have no explicit linkage. Typically, in order for the transactions to match at header level, several fields such as transaction class have to be equal, while others such as credit and debit amounts have to be inverse, while others again such as unique identifiers will not match. At the line level, the same applies and matched lines also need to pair off against each other for the transaction to be considered a match. The first part of the problem is to identify all matching (or contra-matching) pairs, and this article will focus on that, while the second (and optional) part, that of pairing off, will be the subject of a later article.

To see why performance might be an important issue for this type of problem, consider the number of possible comparisons for an example with 10,000 headers each having 100 lines. In this case there would be 100,000,000 pairs of transactions, counting both ways, and 50,000,000 counting one way. A similar calculation gives 10,000 (not 5,000!) line comparisons per header-pair, and hence 500,000,000,000 line comparisons in total. The key of course is to minimise the number of comparisons made explicitly.

We will solve the problem through a single SQL query which will be developed through several versions, using a test example problem based on Oracle standard tables. The queries will be tested using my own SQL benchmarking framework, mentioned in earlier articles, and performance characteristics analysed. This will illustrate some performance aspects of the use of subquery factors and temporary tables, among other things.

Matching and Contra-Matching Sets

In the ERD above, each transaction falls into a logical Match Status Set, where the sets are of four distinct types:

• Unmatched – a single set for transactions having no matching or contra-matching transaction
• Matched – a set for each group of mutually matching transactions
• Contra-Matched A -?a set for each group of transactions that all contra-match to a corresponding B-set
• Contra-Matched B -?a set for each group of transactions that all contra-match to a corresponding A-set

We may define our problem without contra-matching fields, in which case only the first two types of set will be present; we may also have the case where only contra-matching is possible (likely the most common); and a special case may arise where both matching and contra-matching fields are present but where all contra-matching fields may have self-inverse values (for example amounts of zero) and those records having only self-inverse values might be best regarded as falling into one of the first two types of set.

The Sample Problem – Tables and Foreign Key Constraints

We will use two of the Oracle database system views as the basis for our sample problem. The master entity will be the Oracle table defined in the view all_tables, and the detail entity will be the foreign key constraint contained as a subentity in the view all_constraints. The views themselves are very complicated and it is better for our purposes to copy their records into new tables, and for performance testing we’ll copy them multiple times according to the value of a dimensional parameter, using the parameter as a suffix on the owner and table name fields. The sample problem will involve matching only, and tables are defined to match if they have the same set of foreign key references, where the references are defined by the referenced owners and constraint names. As tables without foreign keys all match trivially, we’ll filter these out in the queries.

The table and constraint entities can be represented by the following ERD:

The tables are, with * marking primary keys:
tab_cp

• owner*
• table_name*
• description

con_cp

• owner*
• constraint_name*
• table_name
• constraint_type
• r_owner
• r_constraint_name
• description

Indexes are defined on the two foreign keys on con_cp:
con_tab_fk_N1

• owner
• table_name

con_con_fk_N2

• r_owner
• r_constraint_name

The embedded Excel file below gives the solution for my 11g XE database, for the first problem, of identifying all matches.

Solution Methods
This problem might be considered to divide into two subproblems. The first is to identify all the matching pairs, while the second is to take those matching pairs and eliminate duplicate instances, so that each master record matches against at most one other record. This may reduce the number of master records that have matches; for example, if a matching set has three master records, then only two of them will be matched, against each other, in the final solution. We will consider the first subproblem in this article and the second in a later article.

To find the solution to the first subproblem in SQL, the obvious approach is simply to join the master table to itself to form the set of possible matching pairs, then to apply criteria to filter out any pairs that don’t match. Obviously, we can immediately apply a constraint to avoid selecting the same pair twice by requiring that the rowid of the first record be higher than that of the second. This will halve the number of pairs considered, reducing the initial set of pairs from n! to n!/2 (where ! denotes the mathematical factorial function), and also halving the number after applying any other conditions.

Matching Detail Sets with MINUS Operator
The master-level criteria may be easy enough to apply, using conditions in the join clause, but the detail criteria are more difficult because we have to match two sets of records for any given pair of master records. This leads us to think of Oracle’s set operators, specifically the MINUS operator that subtracts one set from another. Consider the matching pair on line 4028 of the Excel file above, with he solution for our example problem. This shows a match between the two tables OEHR_EMPLOYEES and OEHR_JOB_HISTORY in the TWODAYPLUS_0 schema, each of which has three foreign keys. The three constraints on each of these tables reference the same keys in the same schema, namely DEPT_ID_PK, JOB_ID_PK, EMP_EMP_ID_PK. The following query returns no records:

```SELECT r_owner,
r_constraint_name
FROM con_cp
WHERE constraint_type = 'R'
AND table_name = 'OEHR_EMPLOYEES'
AND owner = 'TWODAYPLUS_0'
MINUS
SELECT r_owner,
r_constraint_name
FROM con_cp
WHERE constraint_type = 'R'
AND table_name = 'OEHR_JOB_HISTORY'
AND owner = 'TWODAYPLUS_0'```

Perhaps then the detail set matching could be effected by a NOT EXISTS clause on the above query with the hard-coded owner and table_name replaced by correlation columns from the main query? There are two problems with this arising from the way Oracle’s set operators work. First, if there were any extra foreign keys in the second table the query would still return no records, as it returns only records that are in the first query section and not in the second, thus showing a false match. Second, Oracle views a set in this context as being the set of distinct records, so if some records are duplicated in either table, but differently from the other one then again a false match is shown. These two tables also exist in Oracle’s HR demo schema, without the OEHR_ prefix. In order to show the problem I added an extra field in each table with a foreign key matching one already present, as follows:

• EMPLOYEES.REL_EMP_ID -> EMP_EMP_ID_PK
• JOB_HISTORY.REL_JOB_ID -> JOB_ID_PK

Now the query above with new schema and table names still returns no records although in our terms the detail record sets are different: EMPLOYEES has set (DEPT_ID_PK, JOB_ID_PK, EMP_EMP_ID_PK, EMP_EMP_ID_PK), while JOB_HISTORY has set (DEPT_ID_PK, JOB_ID_PK, JOB_ID_PK, EMP_EMP_ID_PK). The solution to this problem is of course that we need to group the detail records by the matching fields and add a count, as follows, using our copied schema HR_0:

```SELECT r_owner,
r_constraint_name,
Count(*)
FROM con_cp
WHERE constraint_type = 'R'
AND table_name = 'EMPLOYEES'
AND owner = 'HR_0'
GROUP BY r_owner,
r_constraint_name
MINUS
SELECT r_owner,
r_constraint_name,
Count(*)
FROM con_cp
WHERE constraint_type = 'R'
AND table_name = 'JOB_HISTORY'
AND owner = 'HR_0'
GROUP BY r_owner,
r_constraint_name```

This returns two records:

```R_OWNER  R_CONSTRAINT_NAME    COUNT(*)
=======  =================    ========
HR_0     EMP_EMP_ID_PK        2
HR_0     JOB_ID_PK            1```

As for the first problem, this can be solved in two ways, either by repeating the NOT EXISTS clause with the two sections reversed, or by ensuring separately that the two record sets have the same numbers of records – if they don’t they can’t match, and if they do then the MINUS operator works. Obviously the first solution is going to double the work involved, while the second incurs a cost associated with the counting process but that’s offset by avoidance of the NOT EXISTS execution.

Matching Detail Sets with nested NOT EXISTS Operator
If we consider the MINUS query above before we added grouping, it seems likely that Oracle would evaluate the outer NOT EXISTS by obtaining both record sets, then applying the MINUS opersator, before checking that no records are returned. This would seem inefficient since the outer condition fails if any single record is in the first set but not in the second, so one would want to truncate processing on finding a first such record. This suggests an alternative that might be more effficient, that uses another NOT EXISTS nested within the outer one, which would apply to the following subquery:

```SELECT 1
FROM con_cp c1
WHERE c1.constraint_type = 'R'
AND c1.table_name = 'OEHR_EMPLOYEES'
AND c1.owner = 'TWODAYPLUS_0'
AND NOT EXISTS (
SELECT 1
FROM con_cp c2
WHERE c2.constraint_type = 'R'
AND c2.table_name = 'OEHR_JOB_HISTORY'
AND c2.owner = 'TWODAYPLUS_0'
AND c2.r_owner = c1.r_owner
AND c2.r_constraint_name = c1.r_constraint_name
)```

Here we have not included the grouping solution because it is complicated within this structure, but if the detail table were replaced by either a subquery factor or a temporary table where the grouping were already done, then (as we’ll see) this would work just by adding in an equality condition on the count fields. Again, if we know that the record counts are the same the reverse clause is unnecessary.

Pre-Matching Detail Sets by Aggregates
We noted above that the detail sets can only match if they have the same numbers of records, and that this could be used to avoid doing the set matching twice in opposite orders. We also noted that the work done in counting would be offset by the avoidance of expensive set matching for those pairs that don’t have matching counts. In fact, we can extend this idea to all possible aggregates on the detail record set matching fields, and this will likely result in fewer set matchings in the overall query execution. In our simple test problem we will add minimum and maximum aggregates on the r_constraint_name field, giving the following join conditions, prior to the set matching clause, and where tab represents a subquery factor that computes the aggregates:

```  FROM tab                      t1
JOIN tab                      t2
ON t2.n_det                 = t1.n_det
AND t2.min_det               = t1.min_det
AND t2.max_det               = t1.max_det
AND t2.row_id                > t1.row_id```

Subquery Factors and Temporary Tables
Owing to the importance of aggregation at table level, as explained in the last section above, all query variations considered will include a subquery factor, tab, that does this aggregation. However, we have also noted the need to group and count at the level of detail records, and as this grouped record set needs to be used twice, for each member of a potential matching master pair, it would also seem an obvious candidate for a subquery factor. When we try this though, we’ll see that the query structure now precludes the use of indexes within the detail matching subquery and so we’ll also implement a query that uses a temporary table where the grouping and counting is done in advance.

Query Variations
We will test five query variations, as shown below, where MI and NE denote, respectively, the MINUS and NOT EXISTS methods of detail set matching.

• INL_MI – Detail grouping directly
• SQF_NE – Detail grouping in subquery factor
• GTT_NE – Detail grouping in temporary table
• GTT_NE_X – As GRP_GTT_NE but table-level count aggregation only
• GTT_MI – As GRP_GTT_NE but with MINUS
```************
INL_MI
************
WITH tab AS (
SELECT t.owner,
t.table_name,
t.ROWID                   row_id,
Count(c.ROWID)            n_det,
Min (c.r_constraint_name) min_det,
Max (c.r_constraint_name) max_det
FROM tab_cp                    t
JOIN con_cp                    c
ON c.owner                   = t.owner
AND c.table_name              = t.table_name
AND c.constraint_type         = 'R'
GROUP BY
t.owner,
t.table_name,
t.ROWID
)
SELECT
t1.owner                  owner_1,
t1.table_name             table_name_1,
t2.owner                  owner_2,
t2.table_name             table_name_2,
t1.n_det                  n_det
FROM tab                       t1
JOIN tab                       t2
ON t2.n_det                  = t1.n_det
AND t2.min_det                = t1.min_det
AND t2.max_det                = t1.max_det
AND t2.row_id                 > t1.row_id
WHERE NOT EXISTS (
SELECT c2.r_owner,
c2.r_constraint_name,
Count(*)
FROM con_cp                    c2
WHERE c2.owner                  = t2.owner
AND c2.table_name             = t2.table_name
AND c2.constraint_type        = 'R'
GROUP BY c2.r_owner,
c2.r_constraint_name
MINUS
SELECT c1.r_owner,
c1.r_constraint_name,
Count(*)
FROM con_cp                    c1
WHERE c1.owner                  = t1.owner
AND c1.table_name             = t1.table_name
AND c1.constraint_type        = 'R'
GROUP BY c1.r_owner,
c1.r_constraint_name
)
ORDER BY t1.owner,
t1.table_name,
t2.owner,
t2.table_name

************
SQF_NE
************
WITH det AS (
SELECT owner,
table_name,
r_owner,
r_constraint_name,
Count(*)                  n_dup
FROM con_cp
WHERE constraint_type           = 'R'
GROUP BY owner,
table_name,
r_owner,
r_constraint_name
), tab AS (
SELECT t.owner,
t.table_name,
t.ROWID                   row_id,
Sum (c.n_dup)             n_det,
Min (c.r_constraint_name) min_det,
Max (c.r_constraint_name) max_det
FROM tab_cp                    t
JOIN det                       c
ON c.owner                   = t.owner
AND c.table_name              = t.table_name
GROUP BY
t.owner,
t.table_name,
t.ROWID
)
SELECT
t1.owner                  owner_1,
t1.table_name             table_name_1,
t2.owner                  owner_2,
t2.table_name             table_name_2,
t1.n_det                  n_det
FROM tab                       t1
JOIN tab                       t2
ON t2.n_det                  = t1.n_det
AND t2.min_det                = t1.min_det
AND t2.max_det                = t1.max_det
AND t2.row_id                 > t1.row_id
WHERE NOT EXISTS (
SELECT 1
FROM det                       d1
WHERE d1.owner                  = t1.owner
AND d1.table_name             = t1.table_name
AND (
NOT EXISTS (
SELECT 1
FROM det                       d2
WHERE d2.owner                  = t2.owner
AND d2.table_name             = t2.table_name
AND d2.r_owner                = d1.r_owner
AND d2.r_constraint_name      = d1.r_constraint_name
AND d2.n_dup                  = d1.n_dup
)))
ORDER BY t1.owner,
t1.table_name,
t2.owner,
t2.table_name

************
GTT_NE
************
WITH tab AS (
SELECT t.owner,
t.table_name,
t.ROWID                   row_id,
Sum (c.n_con)             n_det,
Min (c.r_constraint_name) min_det,
Max (c.r_constraint_name) max_det
FROM tab_cp                    t
JOIN grp_gtt                   c
ON c.owner                   = t.owner
AND c.table_name              = t.table_name
GROUP BY
t.owner,
t.table_name,
t.ROWID
)
SELECT
t1.owner                  owner_1,
t1.table_name             table_name_1,
t2.owner                  owner_2,
t2.table_name             table_name_2,
t1.n_det                  n_det
FROM tab                       t1
JOIN tab                       t2
ON t2.n_det                  = t1.n_det
AND t2.min_det                = t1.min_det
AND t2.max_det                = t1.max_det
AND t2.row_id                 > t1.row_id
WHERE NOT EXISTS (
SELECT 1
FROM grp_gtt                   d1
WHERE d1.owner                  = t1.owner
AND d1.table_name             = t1.table_name
AND (
NOT EXISTS (
SELECT 1
FROM grp_gtt                   d2
WHERE d2.owner                  = t2.owner
AND d2.table_name             = t2.table_name
AND d2.r_owner                = d1.r_owner
AND d2.r_constraint_name      = d1.r_constraint_name
AND d2.n_con                  = d1.n_con
)))
ORDER BY t1.owner,
t1.table_name,
t2.owner,
t2.table_name

************
GTT_NE_X
************
WITH tab AS (
SELECT t.owner,
t.table_name,
t.ROWID                   row_id,
Sum (c.n_con)             n_det
FROM tab_cp                    t
JOIN grp_gtt                   c
ON c.owner                   = t.owner
AND c.table_name              = t.table_name
GROUP BY
t.owner,
t.table_name,
t.ROWID
)
SELECT
t1.owner                  owner_1,
t1.table_name             table_name_1,
t2.owner                  owner_2,
t2.table_name             table_name_2,
t1.n_det                  n_det
FROM tab                       t1
JOIN tab                       t2
ON t2.n_det                  = t1.n_det
AND t2.row_id                 > t1.row_id
WHERE NOT EXISTS (
SELECT 1
FROM grp_gtt                   d1
WHERE d1.owner                  = t1.owner
AND d1.table_name             = t1.table_name
AND (
NOT EXISTS (
SELECT 1
FROM grp_gtt                   d2
WHERE d2.owner                  = t2.owner
AND d2.table_name             = t2.table_name
AND d2.r_owner                = d1.r_owner
AND d2.r_constraint_name      = d1.r_constraint_name
AND d2.n_con                  = d1.n_con
)))
ORDER BY t1.owner,
t1.table_name,
t2.owner,
t2.table_name

************
GTT_MI
************
WITH tab AS (
SELECT t.owner,
t.table_name,
t.ROWID                   row_id,
Sum (c.n_con) n_det,
Min (c.r_constraint_name) min_det,
Max (c.r_constraint_name) max_det
FROM tab_cp                    t
JOIN grp_gtt                   c
ON c.owner                   = t.owner
AND c.table_name              = t.table_name
GROUP BY
t.owner,
t.table_name,
t.ROWID
)
SELECT
t1.owner                  owner_1,
t1.table_name             table_name_1,
t2.owner                  owner_2,
t2.table_name             table_name_2,
t1.n_det                  n_det
FROM tab                       t1
JOIN tab                       t2
ON t2.n_det                  = t1.n_det
AND t2.min_det                = t1.min_det
AND t2.max_det                = t1.max_det
AND t2.row_id                 > t1.row_id
WHERE NOT EXISTS (
SELECT d2.r_owner,
d2.r_constraint_name,
d2.n_con
FROM grp_gtt                   d2
WHERE d2.owner                  = t2.owner
AND d2.table_name             = t2.table_name
MINUS
SELECT d1.r_owner,
d1.r_constraint_name,
d1.n_con
FROM grp_gtt                   d1
WHERE d1.owner                  = t1.owner
AND d1.table_name             = t1.table_name
)
ORDER BY t1.owner,
t1.table_name,
t2.owner,
t2.table_name```

The query structure diagrams (QSDs) are in the embedded Excel file below:

Performance Analysis

We presented five query variations above, and in this section give the results of benchmarking these queries across a 1-dimensional data domain obtained by copying the system views 1, 2 and 4 times into my test tables described above. The problem sizes are as follows:
Record Counts

Timings and Statistics
Click on the query name in the file below to jump to the execution plan for the largest data point.

Comparison

The timings above are only for the main queries, so we need also to consider the time to populate and delete the temporary table, for the three GTT queries. This is performed as part of the data point setup, and the framework prints out timings for this part separately. For the last data point, the output was:

```5144 records inserted in grp_gtt
8956 tables, 44780 constraints, 5 c/t

Timer Set: Setup, Constructed at 09 Oct 2012 22:28:49, written at 22:29:04
==========================================================================
[Timer timed: Elapsed (per call): 0.05 (0.000047), CPU (per call): 0.05 (0.000050), calls: 1000, '***' denotes corrected line below]

Timer                  Elapsed          CPU          Calls        Ela/Call        CPU/Call
-----------------   ----------   ----------   ------------   -------------   -------------
Delete test data          3.33         2.78              1         3.33200         2.78000
Delete GTT                0.18         0.17              1         0.18000         0.17000
Insert tab                0.55         0.44              1         0.54500         0.44000
Insert con                7.89         5.94              1         7.89000         5.94000
Insert grp_gtt            0.14         0.14              1         0.14000         0.14000
Count records             0.02         0.01              1         0.01600         0.01000
Gather statistics         2.59         2.06              1         2.58900         2.06000
(Other)                   0.00         0.00              1         0.00000         0.00000
-----------------   ----------   ----------   ------------   -------------   -------------
Total                    14.69        11.54              8         1.83650         1.44250
-----------------   ----------   ----------   ------------   -------------   -------------```

The elapsed times for deleting from, then inserting into the temporary table are given by the ‘Delete GTT’ and ‘Insert grp_gtt’ timers and add up to 0.32s, so do not make much difference (about 5% on the best and less on the others). The following points can be made:

• Doing the detail grouping and counting directly gives the worst performance
• Moving the detail grouping and counting into a subquery factor improves performance by a factor of about 4
• Moving the detail grouping into a temporary table improves performance the most
• Using NOT EXISTS instead of MINUS for detail matching improves performance, as expected, by a factor of about 2
• Using the minimum and maximum aggregates for pre-filtering, in addition to the counts, improves performance by a factor of about 20

Subquery Factors and Temporary Tables
If you look at the execution plan for INL_MI, most of the work is done in the HASH GROUP BY steps, 16 and 20, on totals of 127K records each. Moving this grouping into a subquery factor in SQF_NE means that the operation is done only once rather than many times (110K), and the execution plan for SUBQ_NE shows that it takes very little time (line 3).

However, the execution plan for SUBQ_NE shows (lines 14-22) that the factors are read using full scans, because indexes are not possible. This observation led to the improvement of moving the grouping out of the query altogether and into a separate stage that populates a temporary table, on which indexes can be defined. Lines 15-18 in the plan for GTT_NE show the access is now by index on the detail records.

Memory Usage in INL_MI Query with Grouping
Originally, I tried to have a larger range of data points, but doubling the size again always resulted in an Oracle error on INL_MI, ORA-04030: out of process memory when trying to allocate 123404 bytes (QERGH hash-agg,kllcqas:kllsltba). This is surprising because the execution plan statistics include memory statistics that appear to indicate that all queries use the same maximum amount of memory, which is just over 3MB, incurred in the SORT ORDER BY step (e.g. line 7 below).

My framework also collects statistics from the system view v\$mystat, and prints out those showing large variations in ‘after minus before’ differences across the queries. The framework printed the statistic ‘session pga memory’ and this tells a different story (the values are in the embedded Excel files under Statistics above). The INL_MI query shows increases of 14MB, then 170MB, then 768MB approx. while the other queries all show no increases. It’s hard to understand what’s going on here, but one guess is that the query is revealing an Oracle bug that causes memory not to be released after use and then re-used, but for new executions of the relevant operation to request new memory, and that the execution plans are not reporting this. However, as discussed later, the variation in execution time with problem size is also difficult to understand and suggests that the HASH GROUP BY operations are really being performed on the entire record sets, which would also greatly increase the memory usage. The version, running under Windows 7 is: Oracle Database 11g Express Edition Release 11.2.0.2.0 – Beta

Execution Plan Hash Values
In my last article, I was able to easily identify the distinct plans by looking at the matrix of plan hash values, with values the same indicating the same plan. This time though, that doesn’t work: all hash values are different, but in fact, inspection of the plans shows that for each query there was essentially only one plan. I believe this may be due to the subquery factors, which result in a system-generated view name, which differs each time. For example, here are the last two plans for the INL_MI, where the only difference appears to be in the view names (SYS_TEMP_0FD9D6681_1B0CFB4 for W2 and SYS_TEMP_0FD9D6687_1B0CFB4 for W4) (note that different statistics don’t make the plans different):

Point W2:

```Plan hash value: 89269728

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name                       | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |                            |      1 |        |   8054 |00:01:40.96 |     191K|     21 |     21 |       |       |          |
|   1 |  TEMP TABLE TRANSFORMATION        |                            |      1 |        |   8054 |00:01:40.96 |     191K|     21 |     21 |       |       |          |
|   2 |   LOAD AS SELECT                  |                            |      1 |        |      0 |00:00:00.05 |    3182 |      0 |     21 |   264K|   264K|  264K (0)|
|   3 |    HASH GROUP BY                  |                            |      1 |   2606 |   1628 |00:00:00.05 |    3158 |      0 |      0 |   766K|   766K| 1273K (0)|
|   4 |     NESTED LOOPS                  |                            |      1 |   2606 |   2606 |00:00:00.05 |    3158 |      0 |      0 |       |       |          |
|*  5 |      TABLE ACCESS FULL            | CON_CP                     |      1 |   2606 |   2606 |00:00:00.01 |     625 |      0 |      0 |       |       |          |
|*  6 |      INDEX UNIQUE SCAN            | TCP_PK                     |   2606 |      1 |   2606 |00:00:00.02 |    2533 |      0 |      0 |       |       |          |
|   7 |   SORT ORDER BY                   |                            |      1 |      1 |   8054 |00:01:40.89 |     188K|     21 |      0 |  1824K|   650K| 1621K (0)|
|*  8 |    FILTER                         |                            |      1 |        |   8054 |00:01:24.17 |     188K|     21 |      0 |       |       |          |
|*  9 |     HASH JOIN                     |                            |      1 |      1 |  27242 |00:00:00.15 |      47 |     21 |      0 |   720K|   720K| 1282K (0)|
|  10 |      VIEW                         |                            |      1 |   2606 |   1628 |00:00:00.01 |      25 |     21 |      0 |       |       |          |
|  11 |       TABLE ACCESS FULL           | SYS_TEMP_0FD9D6681_1B0CFB4 |      1 |   2606 |   1628 |00:00:00.01 |      25 |     21 |      0 |       |       |          |
|  12 |      VIEW                         |                            |      1 |   2606 |   1628 |00:00:00.01 |      22 |      0 |      0 |       |       |          |
|  13 |       TABLE ACCESS FULL           | SYS_TEMP_0FD9D6681_1B0CFB4 |      1 |   2606 |   1628 |00:00:00.01 |      22 |      0 |      0 |       |       |          |
|  14 |     MINUS                         |                            |  27242 |        |  19188 |00:01:40.05 |     188K|      0 |      0 |       |       |          |
|  15 |      SORT UNIQUE                  |                            |  27242 |      1 |  28722 |00:00:53.68 |   73872 |      0 |      0 |  2048 |  2048 | 2048  (0)|
|  16 |       HASH GROUP BY               |                            |  27242 |      1 |  31314 |00:00:45.36 |   73872 |      0 |      0 |   750K|   750K|  610K (0)|
|* 17 |        TABLE ACCESS BY INDEX ROWID| CON_CP                     |  27242 |      1 |  31335 |00:00:01.57 |   73872 |      0 |      0 |       |       |          |
|* 18 |         INDEX RANGE SCAN          | CON_TAB_FK_N1              |  27242 |      1 |    175K|00:00:01.26 |   42504 |      0 |      0 |       |       |          |
|  19 |      SORT UNIQUE                  |                            |  27242 |      1 |  30502 |00:00:45.94 |     114K|      0 |      0 |  2048 |  2048 | 2048  (0)|
|  20 |       HASH GROUP BY               |                            |  27242 |      1 |  31314 |00:00:37.86 |     114K|      0 |      0 |   750K|   750K|  910K (0)|
|* 21 |        TABLE ACCESS BY INDEX ROWID| CON_CP                     |  27242 |      1 |  31335 |00:00:01.64 |     114K|      0 |      0 |       |       |          |
|* 22 |         INDEX RANGE SCAN          | CON_TAB_FK_N1              |  27242 |      1 |    183K|00:00:01.29 |   83068 |      0 |      0 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

5 - filter("C"."CONSTRAINT_TYPE"='R')
6 - access("C"."OWNER"="T"."OWNER" AND "C"."TABLE_NAME"="T"."TABLE_NAME")
8 - filter( IS NULL)
9 - access("T2"."N_DET"="T1"."N_DET" AND "T2"."MIN_DET"="T1"."MIN_DET" AND "T2"."MAX_DET"="T1"."MAX_DET")
filter("T2"."ROW_ID">"T1"."ROW_ID")
17 - filter("C2"."CONSTRAINT_TYPE"='R')
18 - access("C2"."OWNER"=:B1 AND "C2"."TABLE_NAME"=:B2)
21 - filter("C1"."CONSTRAINT_TYPE"='R')
22 - access("C1"."OWNER"=:B1 AND "C1"."TABLE_NAME"=:B2)```

Point W4:

```Plan hash value: 892071883

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name                       | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |                            |      1 |        |  16108 |00:25:33.98 |     788K|     42 |     42 |       |       |          |
|   1 |  TEMP TABLE TRANSFORMATION        |                            |      1 |        |  16108 |00:25:33.98 |     788K|     42 |     42 |       |       |          |
|   2 |   LOAD AS SELECT                  |                            |      1 |        |      0 |00:00:00.10 |    5802 |      0 |     42 |   521K|   521K|  521K (0)|
|   3 |    HASH GROUP BY                  |                            |      1 |   5007 |   3256 |00:00:00.09 |    5757 |      0 |      0 |  1001K|   943K| 1273K (0)|
|   4 |     NESTED LOOPS                  |                            |      1 |   5007 |   5212 |00:00:00.09 |    5757 |      0 |      0 |       |       |          |
|*  5 |      TABLE ACCESS FULL            | CON_CP                     |      1 |   4980 |   5212 |00:00:00.01 |     625 |      0 |      0 |       |       |          |
|*  6 |      INDEX UNIQUE SCAN            | TCP_PK                     |   5212 |      1 |   5212 |00:00:00.04 |    5132 |      0 |      0 |       |       |          |
|   7 |   SORT ORDER BY                   |                            |      1 |      1 |  16108 |00:25:33.84 |     782K|     42 |      0 |  3596K|   822K| 3196K (0)|
|*  8 |    FILTER                         |                            |      1 |        |  16108 |00:22:30.61 |     782K|     42 |      0 |       |       |          |
|*  9 |     HASH JOIN                     |                            |      1 |      1 |    110K|00:00:00.62 |      89 |     42 |      0 |   900K|   900K| 1328K (0)|
|  10 |      VIEW                         |                            |      1 |   5007 |   3256 |00:00:00.02 |      46 |     42 |      0 |       |       |          |
|  11 |       TABLE ACCESS FULL           | SYS_TEMP_0FD9D6687_1B0CFB4 |      1 |   5007 |   3256 |00:00:00.01 |      46 |     42 |      0 |       |       |          |
|  12 |      VIEW                         |                            |      1 |   5007 |   3256 |00:00:00.03 |      43 |      0 |      0 |       |       |          |
|  13 |       TABLE ACCESS FULL           | SYS_TEMP_0FD9D6687_1B0CFB4 |      1 |   5007 |   3256 |00:00:00.02 |      43 |      0 |      0 |       |       |          |
|  14 |     MINUS                         |                            |    110K|        |  94488 |00:25:29.91 |     782K|      0 |      0 |       |       |          |
|  15 |      SORT UNIQUE                  |                            |    110K|      1 |    113K|00:14:20.41 |     300K|      0 |      0 |  2048 |  2048 | 2048  (0)|
|  16 |       HASH GROUP BY               |                            |    110K|      1 |    127K|00:11:47.70 |     300K|      0 |      0 |   789K|   789K|  527K (0)|
|* 17 |        TABLE ACCESS BY INDEX ROWID| CON_CP                     |    110K|      1 |    127K|00:00:08.15 |     300K|      0 |      0 |       |       |          |
|* 18 |         INDEX RANGE SCAN          | CON_TAB_FK_N1              |    110K|      1 |    722K|00:00:06.55 |     156K|      0 |      0 |       |       |          |
|  19 |      SORT UNIQUE                  |                            |    110K|      1 |    123K|00:11:07.57 |     481K|      0 |      0 |  2048 |  2048 | 2048  (0)|
|  20 |       HASH GROUP BY               |                            |    110K|      1 |    127K|00:09:52.37 |     481K|      0 |      0 |   789K|   789K|  907K (0)|
|* 21 |        TABLE ACCESS BY INDEX ROWID| CON_CP                     |    110K|      1 |    127K|00:00:08.37 |     481K|      0 |      0 |       |       |          |
|* 22 |         INDEX RANGE SCAN          | CON_TAB_FK_N1              |    110K|      1 |    735K|00:00:06.31 |     337K|      0 |      0 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

5 - filter("C"."CONSTRAINT_TYPE"='R')
6 - access("C"."OWNER"="T"."OWNER" AND "C"."TABLE_NAME"="T"."TABLE_NAME")
8 - filter( IS NULL)
9 - access("T2"."N_DET"="T1"."N_DET" AND "T2"."MIN_DET"="T1"."MIN_DET" AND "T2"."MAX_DET"="T1"."MAX_DET")
filter("T2"."ROW_ID">"T1"."ROW_ID")
17 - filter("C2"."CONSTRAINT_TYPE"='R')
18 - access("C2"."OWNER"=:B1 AND "C2"."TABLE_NAME"=:B2)
21 - filter("C1"."CONSTRAINT_TYPE"='R')
22 - access("C1"."OWNER"=:B1 AND "C1"."TABLE_NAME"=:B2)```

Performance Variation Polynomials
The timings above show that CPU and elapsed times increased by different powers of the problem size increases, according to query.

The inline grouping query INL_MI shows a variation close to the fourth power, which like its memory usage, is very hard to understand. Most of the time is used in the HASH GROUP BY operations at lines 16 and 20, and it rises about 16 times betwen W2 and W4. The numbers of starts rise by 4 times, as expected, but the number of rows per start remains constant at about 1.15, so the work done should rise by about 4 times. It’s almost as though the SQL engine is really processing the entire record set in the HASH GROUP BY, rather than just the subset for the correlated tables, contrary to what the plan says. Again, this looks buggy.

The double subquery factor query SUBQ_NE has about a cubic variation, which is plausible because the table pairing introduces a quadratic term, with the third power coming from the full scans of the detail subquery factor.

All three of the temporary table queries show quadratic variation, which is likely the best obtainable while matching sets directly (but see my next article Holographic Set Matching in SQL for linear solutions bypassing set matching), and arises from the table pairing, but with the details for each pair being constant in size and accessed via indexes. It’s worth noting that the query GTT_NE_X is actually slower than INL_MI and SUB_NE on the smallest data point, but much quicker on the largest, showing the importance of polynomial order for scalability.

Conclusions

• We have shown how to solve master-detail transaction matching problems efficiently, using an example problem, but emphasising the generality of the techniques
• Appropriate use of subquery factors and temporary tables have been demonstrated, with performance analysis
• It’s worth highlighting the technique of pre-filtering on aggregates before comparing sets in detail
• The importance for scalability of performance variation powers has been illustrated, being revealed by dimensional benchmarking
• On finishing this article it occurred to me to wonder whether it might not be possible to use aggregate matching to replace detail set matching altogether, and at least in some cases it is, with linear performance resulting, described in my next article, Holographic Set Matching in SQL (MDTM2)