Knapsacks and Networks in SQL

I opened a GitHub account, Brendan’s GitHub Page last year and have added a number of projects since then, in PL/SQL and other 3GL languages. Partly in response to a request for the code for one of my blog articles on an interesting SQL problem, I decided recently to create a new repo for the SQL behind a group of articles on solving difficult combinatorial optimisation problems via ‘advanced’ SQL techniques such as recursive subquery factoring and model clause, sql_demos – Brendan’s repo for interesting SQL. It includes installation scripts with object creation and data setup, and scripts to run the SQL on the included datasets. The idea is that anyone with the pre-requisites should be able to reproduce my results within a few minutes of downloading the repo.

[Left image from Knapsack problem; right image copied from Chapter 11 Dynamic Programming]

In this article I embed each of the earlier articles relevant to the GitHub repo with a brief preamble.

The first two articles are from January 2013 and use recursive subquery factoring to find exact solutions for the single and multiple knapsack problem, and also include PL/SQL solutions for comparison. They avoid the ‘brute force’ approach by truncating search paths as soon as limit constraints are exceeded. The cumulative paths are stored in string variables passed through the iterations (which would not be possible with the older Connect By hierarchical syntax).

In these articles I illustrate the nature of the problems using Visio diagrams, and include dimensional performance benchmarking results, using a technique that I presented on at last year’s Ireland OUG conference: Dimensional Performance Benchmarking of SQL – IOUG Presentation. I also illustrate the queries using my own method for diagramming SQL queries.

A Simple SQL Solution for the Knapsack Problem (SKP-1), January 2013

An SQL Solution for the Multiple Knapsack Problem (SKP-m), January 2013

The next article uses Model clause to find a more general solution to a problem posed on AskTom, as a ‘bin fitting’ problem. I also solved the problem by other methods including recursive subquery factoring. I illustrate the problem itself, as well as the Model iteration scheme using Visio diagrams, and again include dimensional performance benchmarking. The results show how quadratic performance variation can be turned into much faster linear variation by means of a temporary table in this kind of problem.

SQL for the Balanced Number Partitioning Problem, May 2013

This article arose from a question on OTN, and concerns a type of knapsack or bin-fitting problem that is quite tricky to solve in SQL, where the items fall into categories on which there are separate constraints. I introduced a new idea here, to filter out unpromising paths within recursive subquery factoring by means of analytic functions, in order to allow the technique to be used to generate solutions for larger problems without guaranteed optimality, but in shorter time. Two realistic datasets were used, one from the original poster, and another I got from a scraping website.

SQL for the Fantasy Football Knapsack Problem, June 2013

This article is on a classic ‘hard’ optimisation problem, and uses recursive subquery factoring with the same filtering technique as the previous article, and shows that it’s possible to solve a problem involving 312 American cities quite quickly in pure SQL using the approximation technique. It also uses a simple made-up example dataset to illustrate its working.

SQL for the Travelling Salesman Problem, July 2013

The following two articles concern finding shortest paths between given nodes in a network, and arose from a question on OTN. The first one again uses recursive subquery factoring with a filtering mechanism to exclude paths as early as possible, in a similar way to the approximative solutios methods in the earlier articles. In this case, however, reasoning about the nature of the problem shows that we are not in fact sacrificing optimality. The article has quite a lot of explanatory material on how the SQL works, and uses small dataset examples.

The second article considers how to improve performance further by obtaining a preliminary approximate solution that can be used as a bounding mechanism in a second step to find the exact solutions. This article uses two realistic networks as examples, including one having 428,156 links.

SQL for Shortest Path Problems, April 2015

SQL for Shortest Path Problems 2: A Branch and Bound Approach, May 2015

In the article above I cited results from a general network analysis package I had developed that obtains all the distinct connected subnetworks with their structures in an efficient manner using PL/SQL recursion. It is worth noting that for that kind of problem recursive SQL alone is very inefficient, and I wrote the following article to try to explain why that is so, and why the Connect By syntax is generally much worse than recursive subquery factoring.

Recursive SQL for Network Analysis, and Duality, September 2015

The PL/SQL package mentioned, which I think implements a ‘named’ algorithm although I didn’t know that when I wrote it (I don’t recall the name right now, sorry 🙁 ), is available on GitHub: Brendan’s network structural analysis Oracle package, with article:

PL/SQL Pipelined Function for Network Analysis, May 2015






Dimensional Benchmarking of Bracket Parsing SQL

I noticed an interesting thread on OTN recently, Matching ( in a string. It’s about using SQL to find matching bracket pairs (technically ‘parentheses’ but ‘brackets’ is shorter, and it makes no difference to the SQL). Incidentally, I found recently that nested bracket expressions are a nice way of compactly representing the structure of complex hash join execution plans, A Note on Oracle Join Orders and Hints.

I thought it would be interesting to run some of the solution queries through my benchmarking package to test performance (A Framework for Dimensional Benchmarking of SQL Performance). I decided to consider only the queries that addressed multiple records, and the form of the problem that requires returning record uid, opening and closing bracket positions, plus the substring enclosed. These were by ‘mathguy’ and ‘Peter vd Zwan’, and I made very minor tweaks for consistency. I also wrote a query myself using PL/SQL in an inline SQL function using the new (v12.1) ‘WITH Function’ functionality, and copied this to a version using a pipelined database function to check for any performance differences. The four queries tested were then:

  • CBL_QRY, mathguy: Connect By, Analytics, Regex
  • MRB_QRY, Peter vd Zwan: Connect By, Match_Recognize
  • WFB_QRY, me: With PL/SQL Function, Arrays
  • PFB_QRY, me: Pipelined PL/SQL Function, Arrays

Bracket Pair Definition

Consider a function (BrDiff) defined at each character position as the difference between the number of opening and closing brackets to the left of, or at, that position.

A closing bracket closes an opening bracket if it is the first closing bracket where BrDiff is 1 less than BrDiff at the opening bracket. If all brackets are in some pair, then the expression can be considered well-formed.

This can be illustrated with a diagram for the fourth functional example below.

Test Problem

BRACKET_STRINGS Table

CREATE TABLE bracket_strings (id NUMBER, str VARCHAR2(4000))
/

Functional Test Data

I took four of mathguy’s test records, excluding the (deliberately) badly-formed strings, and which included some embedded returns:

Test Data

     ID STR
------- ----------------------------------------
      1  ((Hello ( Hi Hi hi ( A B C ( D)) (EF)
        why Whwy whyhhh )
        )
        )

      2 (1+3*(3-1) + 3*(2+1))
      3 ()()*(())a()(())
      4 b0(b1(b2(b3(x))(xy)))

Result

     ID      O_POS      C_POS STR
------- ---------- ---------- ----------------------------------------
      1          2         60 ((Hello ( Hi Hi hi ( A B C ( D)) (EF)
                              why Whwy whyhhh )
                              )
                              )

      1          3         58 (Hello ( Hi Hi hi ( A B C ( D)) (EF)
                              why Whwy whyhhh )
                              )

      1         10         56 ( Hi Hi hi ( A B C ( D)) (EF)
                              why Whwy whyhhh )

      1         21         33 ( A B C ( D))
      1         29         32 ( D)
      1         35         38 (EF)
      2          1         21 (1+3*(3-1) + 3*(2+1))
      2          6         10 (3-1)
      2         16         20 (2+1)
      3          1          2 ()
      3          3          4 ()
      3          6          9 (())
      3          7          8 ()
      3         11         12 ()
      3         13         16 (())
      3         14         15 ()
      4          3         21 (b1(b2(b3(x))(xy)))
      4          6         20 (b2(b3(x))(xy))
      4          9         15 (b3(x))
      4         12         14 (x)
      4         16         19 (xy)

21 rows selected.

All queries returned the expected results above.

Performance Test Data

Each test set consisted of 100 records with the str column containing the brackets expression dependent on width (w) and depth (d) parameters, as follows:

  • Each str column contains w bracket pairs
  • The str column begins with a 3-character record number
  • After the record number, the str column begins with d opening brackets with 3 characters of text, like: ‘(001’, etc., followed by the d closing brackets, then the remaining wd pairs in an unnested sequence, like ‘(001)’

When w=d the pairs are fully nested, and when d=0 there is no nesting, just a sequence of ‘(123)’ stringss.

This choice of test data sets allows us to see if both number of brackets, and bracket nesting have any effect on performance.

  • Depth fixed, at small width point; width varies: d=100, w=(100, 200, 300, 400)
  • Width fixed at high depth point; depth varies: w=400, d=(0, 100, 200, 300, 400)

The output from the test queries therefore consists of 100*w records with a record identifier and a bracketed string. For performance testing purposes the benchmarking framework writes the results to a file in csv format, while counting only the query steps in the query timing results.

All the queries showed strong time correlation with width, with smaller correlation with depth.

Connect By, Analytics, Regex Query (CBL_QRY)

WITH    d ( id, str, pos ) as (
      select id, str, regexp_instr(str, '\(|\)', 1, level)
      from   bracket_strings
      connect by level <= length(str) - length(translate(str, 'x()', 'x'))
             and prior id = id
             and prior sys_guid() is not null
    ),
    p ( id, str, pos, flag, l_ct, ct ) as (
      select id, str, pos, case substr(str, pos, 1) when '(' then 1 else -1 end,
             sum(case substr(str, pos, 1) when '(' then 1         end) over (partition by id order by pos),
             sum(case substr(str, pos, 1) when '(' then 1 else -1 end) over (partition by id order by pos)
      from   d
    ),
    f ( id, str, pos, flag, l_ct, ct, o_ct ) as (
      select id, str, pos, flag, l_ct, ct + case flag when 1 then 0 else 1 end as ct,
             row_number() over (partition by id, flag, ct order by pos)
      from   p
    )
select   /*+ CBL_QRY gather_plan_statistics */ id,
        min(case when flag =  1 then pos end) as o_pos,
        min(case when flag = -1 then pos end) as c_pos,
                                Substr (str, min(case when flag =  1 then pos end), min(case when flag = -1 then pos end) - min(case when flag =  1 then pos end) + 1) str
from    f
group by id, str, ct, o_ct
order by 1, 2

Plan hash value: 2736674058

------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                 |      1 |        |  40000 |00:00:16.30 |      43 |  40000 |  40000 |       |       |          |         |
|   1 |  SORT ORDER BY                      |                 |      1 |    100 |  40000 |00:00:16.30 |      43 |  40000 |  40000 |    21M|  1702K|   19M (0)|         |
|   2 |   HASH GROUP BY                     |                 |      1 |    100 |  40000 |00:00:16.24 |      43 |  40000 |  40000 |    85M|  7293K|   85M (0)|         |
|   3 |    VIEW                             |                 |      1 |    100 |  80000 |00:01:19.35 |      43 |  40000 |  40000 |       |       |          |         |
|   4 |     WINDOW SORT                     |                 |      1 |    100 |  80000 |00:01:19.27 |      43 |  40000 |  40000 |   175M|  4458K|   97M (1)|     157K|
|   5 |      VIEW                           |                 |      1 |    100 |  80000 |00:01:05.90 |      40 |  20000 |  20000 |       |       |          |         |
|   6 |       WINDOW SORT                   |                 |      1 |    100 |  80000 |00:01:05.86 |      40 |  20000 |  20000 |   175M|  4457K|   97M (0)|     157K|
|   7 |        VIEW                         |                 |      1 |    100 |  80000 |00:00:09.77 |      38 |      0 |      0 |       |       |          |         |
|*  8 |         CONNECT BY WITHOUT FILTERING|                 |      1 |        |  80000 |00:00:02.26 |      38 |      0 |      0 |   267K|   267K|  237K (0)|         |
|   9 |          TABLE ACCESS FULL          | BRACKET_STRINGS |      1 |    100 |    100 |00:00:00.01 |      38 |      0 |      0 |       |       |          |         |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   8 - access("ID"=PRIOR NULL)

Notes on CBL_QRY

Subquery d uses regexp_instr with connect by to generate rows for each bracket.

Connect By, Match_Recognize Query (MRB_QRY)

WITH b as
(
select
  substr(str,level,1) s
  ,level n
  ,id
  ,str
from
  bracket_strings
connect by
id =  prior id
and substr(str,level,1) is not null
and prior sys_guid() is not null
)
select  /*+ MRB_QRY gather_plan_statistics */ 
  id
  ,o_pos
  ,c_pos
  ,substr(str,o_pos,c_pos - o_pos + 1) str
from
b
MATCH_RECOGNIZE (
partition by id
ORDER BY n
MEASURES 
  str as str
  ,FIRST( N) AS o_pos
  ,LAST( N) AS c_pos
one ROW PER MATCH
AFTER MATCH SKIP to next row
PATTERN (ob (ob | nb | cb)*? lcb)
DEFINE
  ob as ob.s = '('
  ,cb as cb.s = ')'
  ,nb as nb.s not in ('(',')')
  ,lcb as lcb.s = ')' and (count(ob.s) = count(cb.s) + 1)
) MR

Plan hash value: 2214599770

-----------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |                 |      1 |        |  40000 |00:03:38.26 |      40 |   4473K|  50075 |       |       |          |
|   1 |  SORT ORDER BY                   |                 |      1 |    100 |  40000 |00:03:38.26 |      40 |   4473K|  50075 |    21M|  1702K|   19M (0)|
|   2 |   VIEW                           |                 |      1 |    100 |  40000 |00:04:48.17 |      40 |   4473K|  50075 |       |       |          |
|   3 |    MATCH RECOGNIZE SORT          |                 |      1 |    100 |  40000 |00:04:48.16 |      40 |   4473K|  50075 |   440M|  6922K|  163M (0)|
|   4 |     VIEW                         |                 |      1 |    100 |    200K|00:00:04.65 |      38 |      0 |      0 |       |       |          |
|*  5 |      CONNECT BY WITHOUT FILTERING|                 |      1 |        |    200K|00:00:04.54 |      38 |      0 |      0 |   267K|   267K|  237K (0)|
|   6 |       TABLE ACCESS FULL          | BRACKET_STRINGS |      1 |    100 |    100 |00:00:00.01 |      38 |      0 |      0 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access("ID"=PRIOR NULL)

Notes on MRB_QRY

This uses the v12.1 feature Match_Recognize operating on the charcaters in the string after conversion to rows.

With PL/SQL Function, Arrays Query (WFB_QRY)

WITH FUNCTION Parse_Brackets (p_str VARCHAR2) RETURN bra_lis_type IS /* WFB_QRY */ 
  c_n_ob       CONSTANT PLS_INTEGER := Length (p_str) - Length (Replace (p_str, '(', ''));
  l_ob_lis              SYS.ODCINumberList := SYS.ODCINumberList();
  l_cb_lis              SYS.ODCINumberList := SYS.ODCINumberList();
  TYPE b_rec_type   IS  RECORD (pos INTEGER, diff INTEGER);
  TYPE b_lis_type   IS  VARRAY(32767) OF b_rec_type;
  l_b_lis               b_lis_type := b_lis_type(NULL);
  l_bra_lis             bra_lis_type := bra_lis_type();
  n_b                   PLS_INTEGER := 0;
  n_ob                  PLS_INTEGER := 0;
  n_cb                  PLS_INTEGER := 0;
  l_chr                 VARCHAR2(1);
  l_o_diff              PLS_INTEGER;
BEGIN

  IF c_n_ob = 0 THEN
    RETURN NULL;
  END IF;
  l_ob_lis.EXTEND (c_n_ob);
  l_bra_lis.EXTEND (c_n_ob);
  l_cb_lis.EXTEND (c_n_ob);
  l_b_lis.EXTEND (c_n_ob + c_n_ob);
 
  FOR i IN 1..Length (p_str) LOOP
 
    l_chr := Substr (p_str, i, 1);
    IF l_chr NOT IN ('(', ')') THEN CONTINUE; END IF;

    n_b := n_b + 1;
    l_b_lis(n_b).pos := i;
 
    IF l_chr = '(' THEN
      n_ob := n_ob + 1;
      l_ob_lis(n_ob) := n_b;
    ELSE
      n_cb := n_cb + 1;
      l_cb_lis(n_cb) := n_b;
    END IF;

    l_b_lis(n_b).diff := n_ob - n_cb;
 
  END LOOP;

  FOR i IN 1..n_ob LOOP

    l_o_diff := l_b_lis (l_ob_lis(i)).diff;
    FOR j IN 1..n_cb LOOP

      IF l_b_lis (l_cb_lis(j)).pos < l_b_lis (l_ob_lis(i)).pos THEN CONTINUE; END IF;
      IF l_o_diff = l_b_lis (l_cb_lis(j)).diff + 1 THEN

        l_bra_lis(i) := bra_rec_type (l_b_lis(l_ob_lis(i)).pos, l_b_lis(l_cb_lis(j)).pos, Substr (p_str, l_b_lis(l_ob_lis(i)).pos, l_b_lis(l_cb_lis(j)).pos - l_b_lis(l_ob_lis(i)).pos + 1));
        EXIT;

      END IF;

    END LOOP;
 
  END LOOP;
  RETURN l_bra_lis;

END;
SELECT
    b.id, t.o_pos, t.c_pos, t.str
  FROM bracket_strings b
  OUTER APPLY TABLE (Parse_Brackets (b.str)) t
 ORDER BY 1, 2

Notes on WFB_QRY

This uses the v12.1 feature whereby a PL/SQL function can be included directly in a query.

Pipelined PL/SQL Function, Arrays Query (PFB_QRY)

WFB_QRY – Pipelined Function

CREATE OR REPLACE TYPE bra_rec_type IS OBJECT (o_pos INTEGER, c_pos INTEGER, str VARCHAR2(4000));
/
CREATE TYPE bra_lis_type IS VARRAY(4000) OF bra_rec_type;
/
FUNCTION Parse_Brackets (p_str VARCHAR2) RETURN bra_lis_type PIPELINED IS
  c_n_ob       CONSTANT PLS_INTEGER := Length (p_str) - Length (Replace (p_str, '(', ''));
  l_ob_lis              SYS.ODCINumberList := SYS.ODCINumberList();
  l_cb_lis              SYS.ODCINumberList := SYS.ODCINumberList();
  TYPE b_rec_type   IS  RECORD (pos INTEGER, diff INTEGER);
  TYPE b_lis_type   IS  VARRAY(32767) OF b_rec_type;
  l_b_lis               b_lis_type := b_lis_type(NULL);
  l_bra_lis             bra_lis_type := bra_lis_type();
  n_b                   PLS_INTEGER := 0;
  n_ob                  PLS_INTEGER := 0;
  n_cb                  PLS_INTEGER := 0;
  l_chr                 VARCHAR2(1);
  l_o_diff              PLS_INTEGER;
BEGIN

  IF c_n_ob = 0 THEN
    RETURN;
  END IF;
  l_ob_lis.EXTEND (c_n_ob);
  l_bra_lis.EXTEND (c_n_ob);
  l_cb_lis.EXTEND (c_n_ob);
  l_b_lis.EXTEND (c_n_ob + c_n_ob);
 
  FOR i IN 1..Length (p_str) LOOP
 
    l_chr := Substr (p_str, i, 1);
    IF l_chr NOT IN ('(', ')') THEN CONTINUE; END IF;

    n_b := n_b + 1;
    l_b_lis(n_b).pos := i;
 
    IF l_chr = '(' THEN
      n_ob := n_ob + 1;
      l_ob_lis(n_ob) := n_b;
    ELSE
      n_cb := n_cb + 1;
      l_cb_lis(n_cb) := n_b;
    END IF;

    l_b_lis(n_b).diff := n_ob - n_cb;
 
  END LOOP;

  FOR i IN 1..n_ob LOOP

    l_o_diff := l_b_lis (l_ob_lis(i)).diff;
    FOR j IN 1..n_cb LOOP

      IF l_b_lis (l_cb_lis(j)).pos < l_b_lis (l_ob_lis(i)).pos THEN CONTINUE; END IF;
      IF l_o_diff = l_b_lis (l_cb_lis(j)).diff + 1 THEN

        PIPE ROW (bra_rec_type (l_b_lis(l_ob_lis(i)).pos, l_b_lis(l_cb_lis(j)).pos, Substr (p_str, l_b_lis(l_ob_lis(i)).pos, l_b_lis(l_cb_lis(j)).pos - l_b_lis(l_ob_lis(i)).pos + 1)));
        EXIT;

      END IF;

    END LOOP;
 
  END LOOP;

END Parse_Brackets;

Plan hash value: 3367347570

---------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                 |      1 |        |  40000 |00:00:01.17 |      38 |       |       |          |
|   1 |  SORT ORDER BY                       |                 |      1 |    816K|  40000 |00:00:01.17 |      38 |    21M|  1702K|   19M (0)|
|   2 |   NESTED LOOPS OUTER                 |                 |      1 |    816K|  40000 |00:00:16.80 |      38 |       |       |          |
|   3 |    TABLE ACCESS FULL                 | BRACKET_STRINGS |      1 |    100 |    100 |00:00:00.01 |      38 |       |       |          |
|   4 |    VIEW                              | VW_LAT_D4FD8C38 |    100 |   8168 |  40000 |00:00:01.13 |       0 |       |       |          |
|   5 |     COLLECTION ITERATOR PICKLER FETCH| PARSE_BRACKETS  |    100 |   8168 |  40000 |00:00:01.10 |       0 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------------

PFB_QRY – Query

SELECT  /*+ PFB_QRY gather_plan_statistics */
        b.id        id, 
        t.o_pos     o_pos, 
        t.c_pos     c_pos,
        t.str       str
  FROM bracket_strings b
  OUTER APPLY TABLE (Strings.Parse_Brackets (b.str)) t
 ORDER BY b.id, t.o_pos

Plan hash value: 3367347570

---------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                 |      1 |        |  40000 |00:00:01.15 |      38 |       |       |          |
|   1 |  SORT ORDER BY                       |                 |      1 |    816K|  40000 |00:00:01.15 |      38 |    21M|  1702K|   19M (0)|
|   2 |   NESTED LOOPS OUTER                 |                 |      1 |    816K|  40000 |00:00:01.95 |      38 |       |       |          |
|   3 |    TABLE ACCESS FULL                 | BRACKET_STRINGS |      1 |    100 |    100 |00:00:00.01 |      38 |       |       |          |
|   4 |    VIEW                              | VW_LAT_D4FD8C38 |    100 |   8168 |  40000 |00:00:01.51 |       0 |       |       |          |
|   5 |     COLLECTION ITERATOR PICKLER FETCH| PARSE_BRACKETS  |    100 |   8168 |  40000 |00:00:01.50 |       0 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------------

Notes on PFB_QRY

This uses PL/SQL pipelined database function which is called in the query.

Performance Testing Results

Key

If CPU time (y) is proportional to the varying dimension (x), for data points 1 and 2, we would have:

y = kx

and so, for any two data points 1 and 2:

y2.x1/(y1.x2) = 1

We can take the actual value of the linear ratio as a marker for linear proportionality or not.

If CPU time (y) is proportional to the varying dimension (x), for data points 1 and 2, we would have:

y = kx.x

and so, for any two data points 1 and 2:

y2.x1.x1/(y1.x2.x2) = 1

We can take the actual value of the quadratic ratio as a marker for quadratic proportionality or not.

  • LRTB = Ratio to best time at high data point
  • RTP_L = Linear ratio as defined above, averaged over successive data points
  • RTP_Q = Quadratic ratio as defined above, averaged over successive data points

Depth fixed, at small width point; width varies: d=100, w=(100, 200, 300, 400)

Notes on Results for Fixed Depth

  • CPU time increases with width for all queries at above a linear rate
  • CBL_QRY is significantly faster than MRB_QRY, which appears to be rising quadratically
  • Both WFB_QRY and PFB_QRY are much faster than the non-PL/SQL queries
  • PFB_QRY is slightly faster than WFB_QRY. This could be regarded as too small a difference to be significant, but is consistent across the data points, despite the context switching with database function calls

Width fixed at high depth point; depth varies: w=400, d=(0, 100, 200, 300, 400)

Notes on Results for Fixed Width

  • CPU time increases with depth for all queries although at a sublinear rate
  • CBL_QRY has a big jump in CPU time between 0 and 100, where nesting starts to come in, and was actually faster than CBL_QRY without nesting






Design Patterns for Database API Testing 2: Views 1 – Design

Last October I gave a presentation on database unit testing with utPLSQL, Oracle Unit Testing with utPLSQL. I mentioned design patterns as a way of reducing the effort of building unit tests and outlined some strategies for coding them effectively.

In the current set of articles, I develop the ideas further, starting from the idea that all database APIs can be considered in terms of the axes:

  • direction (i.e. getter or setter, noting that setters can also ‘get’)
  • mode (i.e. real time or batch)

For each cell in the implied matrix, I construct an example API (or view) with specified requirements against Oracle’s HR demo schema, and use this example to construct a testing program with appropriate scenarios as a design pattern. Concepts and common patterns and anti-patterns in automated API testing are discussed throughout, and these are largely independent of testing framework used. However, the examples use my own lightweight independent framework that is designed to help avoid many API testing anti-patterns. The code is available on GitHub here, BrenPatF/trapit_oracle_tester, and includes both framework and design pattern examples.

Behind the four examples, there is an underlying design pattern that involves wrapping the API call in a ‘pure’ procedure, called once per scenario, with the output ‘actuals’ array including everything affected by the API, whether as output parameters, or on database tables, etc. The inputs are also extended from the API parameters to include any other effective inputs. Assertion takes place after all scenarios and is against the extended outputs, with extended inputs also listed. This concept of the ‘pure’ function, central to Functional Programming, has important advantages in automated testing. I explained the concepts involved in a presentation at the Oracle User Group Ireland Conference in March 2018:

The Database API Viewed As A Mathematical Function: Insights into Testing


In this 2-part article, I present a design pattern for testing views. I start by discussing when and how to test views. Unlike in the design paatern of the first article in the series, Design Patterns for Database API Testing 1: Web Service Saving – Design, test data has to be created during testing of views, and a very general approach to creating and selecting the test data is proposed. The use case for the design pattern is described, and scenarios and sub-scenarios are defined conceptually. Finally, the output from the testing is presented, with notes.

The second post provides some code extracts, with notes: Design Patterns for Database API Testing 2: Views 2 – Code.

When to Test Views

Views can be simple or complex, or, as I categorise them in Brendan’s 2-Page Oracle Programming Standards, thin or thick, where thick views include table joins and thinviews don’t. Thin views do not normally require testing while it may or may not be appropriate to test thick views.

As explained in the second part of the first article mentioned above, method-based testing is a bad idea, and occurs when the test suite is based on testing all the methods in a package, rather than units of (external) behaviour (often corresponding to procedures prefixed AIP in a common naming convention). Similarly, we can consider views in the same way as methods and ask whether they represent testable units of behaviour or are merely internal code structures, which should not normally have individual automated tests for the reasons given there.

Good examples of views that should be tested would be those that form the basis of complex data extraction to file, by ETL tools such as Informatica, or those that form the basis of reporting tools such as Business Objects. In fact, it is very good practice to place SQL for these tools into views precisely so that they can be tested.

How to Test Views Using a PL/SQL Testing Framework

In order to leverage a PL/SQL API testing framework to also test views, the API test package procedures call a library procedure passing the name of the relevant view: The library procedure returns the result of querying the view as an array of delimited strings, and the API test procedures then compare the results against their own expected results.

Each API test procedure will have its own setup local procedure to create test data, and we need to discuss the issue of distinguishing test data from pre-existing data.

Test Data

In the earlier article on database save procedures, we did not create any test data within the testing code itself, but the base procedure did create data, and those were queried back for assertion. In order to select only the data created by the procedure call a prefix was used in one of the string fields which was assumed not to exist already. This is a workable approach in some cases, but may not be possible in general. Let us consider the different types of database data that may affect our testing:

  • Data created by the base code being tested
  • Data created by test code to be read by the base code
  • Data not created by test code to be read by base code

In order to verify that the program calls are giving results as expected, the test code needs to know all the data that influence the results, not necessarily just directly created data. Our view testing use case described below has an example where the results depend on an aggregate of all the records on the database. This is a problem when we have a shared database, where we cannot freeze the data at the time of test development. In order to handle this problem, we propose to borrow a technique used in Oracle’s ebusinees applications.

Partitioning Views with System Contexts
In Oracle ebusiness’s multi-org implementations, transactions are partitioned by a numeric identifier for the organization owning the transaction. This org_id value is stored in a column in the base table on transaction creation. Within the application code the base table is not queried directly, but through a view that restricts records returned to those for the organization corresponding to the current role of the application user, which is stored in the userenv system context (this is true up to release 11.5, but the mechanism changed in release 12.1).

See SYS_CONTEXT for information on the system context database feature, and Oracle E-Business Suite Multiple Organizations Implementation Guide (12.1) for release 12.1 multi-org implementation in Oracle ebusiness.

Partitioning Views for Testing
We propose to use views in a similar way to the multi-org views, to restrict records to those created in the testing session, by means of a ttid column on the base table that will hold the session id. The new optional column is added to those tables where this approach is required, and view are created on the tables. Our testing utility package Utils_TT sets a context variable to the value ‘TT’ to signify testing mode, and the session id is set to a package variable in the general utilities package Utils.

Any base code that inserts data into the tables has to check for test mode, and if set, put the session id into the ttid field, and if not, leave it blank. The views use the following clause:

 WHERE (ttid = SYS_Context ('userenv', 'sessionid') OR 
        Substr (Nvl (SYS_Context ('userenv', 'client_info'), 'XX'), 1, 2) != 'TT')

Both test code and base code now query the views instead of the base tables. As the base code to write to the tables has to account for the new column, it is necessary for the column to be added in all instances including production. If this seems a little drastic, consider the importance that you attach to testing, and bear in mind that the earlier, less general, approaches may suffice in many cases. In these design pattern demos I use the general solution.

Schema Structure

In the earlier articles, the base code and test packages were created in the HR schema, with utility packages kept in the custom brendan schema. However, it is more common to use separate schemas for code and data, so we will now place all packages and supporting objects in the brendan schema, and create the testing views there.

Design Pattern Use Case for Testing Views

Modern Oracle SQL is very powerful and can apply complex logic within a single statement, reducing the need for more complex procedural code. In order to show how to test SQL, we will devise a test view, HR_Test_V, having a range of features that we might want to test in general:

  • Inner joins suppress driving records where there is no joining record
  • Outer joins return driving records where there is no joining record
  • Analytic functions that partition by some key, and return aggregates on the returned record set
  • Functions based on aggregates over records that include those not in the returned record set
  • Constraints based on aggregates over records that include those not in the returned record set
  • Constraints on column values

The view functionality can be described in words as:

  • Selected values
    • Employee name, department name, and salary
    • Manager’s name
    • Ratio of employee’s salary to the department average (returned employees only)
    • Ratio of employee’s salary to the average salary of all employees
  • Constraints
    • Exclude employees in job ‘AD_ASST’
    • Exclude employees without a department
    • Do not return any records if the total salary of all employees is below 1600
  • Outer join
    • Include employees both with and without a manager

The view SQL is:

CREATE OR REPLACE VIEW hr_test_view_v AS
WITH all_emps AS (
        SELECT Avg (salary) avg_sal, SUM (salary) sal_tot_g
          FROM employees e
)
SELECT e.last_name, d.department_name, m.last_name manager, e.salary,
       Round (e.salary / Avg (e.salary) OVER (PARTITION BY e.department_id), 2) sal_rat,
       Round (e.salary / a.avg_sal, 2) sal_rat_g
  FROM all_emps a
 CROSS JOIN employees e
  JOIN departments d
    ON d.department_id = e.department_id
  LEFT JOIN employees m
    ON m.employee_id = e.manager_id
 WHERE e.job_id != 'AD_ASST'
   AND a.sal_tot_g >= 1600

Scenarios and Sub-scenarios

Scenario definition
Following our earlier article, we may define a scenario as being the set of all relevant records, both on the database and passed as parameters, to a single program call. API or view testing involves creating one or more scenarios, calling the program (or executing the process) for each scenario, and verifying that the output records are as expected.

Good testing is achieved when the scenarios are chosen to validate as wide a range of behaviours as possible. It is not always, or usually, necessary to create a new scenario for each aspect of behaviour to be tested.

Sub-scenario definition
Often, several features can be tested in the same program call by setting up different records in the scenario that will independently test the different features. For example, in our use case above we can create employees with and without a department, and with and without a manager in the same scenario to test the different types of join.

It may be helpful to think of these separate records, or fields within a record, as corresponding to sub-scenarios, and try to construct scenarios as efficiently as possible without making more calls than necessary.

View Test Output

Data setup section

SCENARIO 1: DS-1, testing inner, outer joins, analytic over dep, and global ratios with 1 dep, Employees created in setup: DS-1 - 4 emps, 1 dep (10), emp-3 has no dep, emp-4 has bad job
=========================================================================================================================================================================================

#  Employee id  Department id     Manager  Job id          Salary
-  -----------  -------------  ----------  ----------  ----------
1         1493             10              IT_PROG           1000
2         1494             10        1493  IT_PROG           2000
3         1495                       1493  IT_PROG           3000
4         1496             10        1493  AD_ASST           4000

SCENARIO 2: DS-2, testing same as 1 but with extra emp in another dep, Employees created in setup: DS-2 - As dataset 1 but with extra emp-5, in second dep (20)
===============================================================================================================================================================

#  Employee id  Department id     Manager  Job id          Salary
-  -----------  -------------  ----------  ----------  ----------
1         1497             10              IT_PROG           1000
2         1498             10        1497  IT_PROG           2000
3         1499                       1497  IT_PROG           3000
4         1500             10        1497  AD_ASST           4000
5         1501             20        1497  IT_PROG           5000

SCENARIO 3: DS-2, passing 'WHERE dep=10', Employees created in setup: DS-2 - As dataset 1 but with extra emp-5, in second dep (20)
==================================================================================================================================

#  Employee id  Department id     Manager  Job id          Salary
-  -----------  -------------  ----------  ----------  ----------
1         1502             10              IT_PROG           1000
2         1503             10        1502  IT_PROG           2000
3         1504                       1502  IT_PROG           3000
4         1505             10        1502  AD_ASST           4000
5         1506             20        1502  IT_PROG           5000

SCENARIO 4: DS-3, Salaries total 1500 (< threshold of 1600), Employees created in setup: DS-3 - As dataset 2 but with salaries * 0.1, total below reporting threshold of 1600
=============================================================================================================================================================================

#  Employee id  Department id     Manager  Job id          Salary
-  -----------  -------------  ----------  ----------  ----------
1         1507             10              IT_PROG            100
2         1508             10        1507  IT_PROG            200
3         1509                       1507  IT_PROG            300
4         1510             10        1507  AD_ASST            400
5         1511             20        1507  IT_PROG            500

Notes on data setup section

  • There are three data sets, and four scenarios, each of which references a data set
  • The call to set up the data for a scenario writes out all the data created
  • A header provides a description of the features (or sub-scenarios) in the data set
  • In the output above scenarios 2 and 3 use the same data set, DS-2

Results section

SQL>BEGIN

  Utils.Clear_Log;
  Utils_TT.Run_Suite (Utils_TT.c_tt_suite_bren);

EXCEPTION
  WHEN OTHERS THEN
    Utils.Write_Other_Error;
END;
/
SQL> @L_Log_Default

TEXT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


TRAPIT TEST: TT_View_Drivers.tt_HR_Test_View_V
==============================================

Employees created in setup: DS-1 - 4 emps, 1 dep (10), emp-3 has no dep, emp-4 has bad job
==========================================================================================

#  Employee id  Department id     Manager  Job id          Salary
-  -----------  -------------  ----------  ----------  ----------
1         1518             10              IT_PROG           1000
2         1519             10        1518  IT_PROG           2000
3         1520                       1518  IT_PROG           3000
4         1521             10        1518  AD_ASST           4000

Employees created in setup: DS-2 - As dataset 1 but with extra emp-5, in second dep (20)
========================================================================================

#  Employee id  Department id     Manager  Job id          Salary
-  -----------  -------------  ----------  ----------  ----------
1         1522             10              IT_PROG           1000
2         1523             10        1522  IT_PROG           2000
3         1524                       1522  IT_PROG           3000
4         1525             10        1522  AD_ASST           4000
5         1526             20        1522  IT_PROG           5000

Employees created in setup: DS-2 - As dataset 1 but with extra emp-5, in second dep (20)
========================================================================================

#  Employee id  Department id     Manager  Job id          Salary
-  -----------  -------------  ----------  ----------  ----------
1         1527             10              IT_PROG           1000
2         1528             10        1527  IT_PROG           2000
3         1529                       1527  IT_PROG           3000
4         1530             10        1527  AD_ASST           4000
5         1531             20        1527  IT_PROG           5000

Employees created in setup: DS-3 - As dataset 2 but with salaries * 0.1, total below reporting threshold of 1600
================================================================================================================

#  Employee id  Department id     Manager  Job id          Salary
-  -----------  -------------  ----------  ----------  ----------
1         1532             10              IT_PROG            100
2         1533             10        1532  IT_PROG            200
3         1534                       1532  IT_PROG            300
4         1535             10        1532  AD_ASST            400
5         1536             20        1532  IT_PROG            500

SCENARIO 1: DS-1, testing inner, outer joins, analytic over dep, and global ratios with 1 dep {
===============================================================================================

    INPUTS
    ======

        GROUP Employee {
        ================

            Employee Id  Last Name  Email  Hire Date  Job      Salary  Manager Id  department Id
            -----------  ---------  -----  ---------  -------  ------  ----------  -------------
                   1518  LN_1       EM_1   09-JUL-16  IT_PROG    1000                         10
                   1519  LN_2       EM_2   09-JUL-16  IT_PROG    2000        1518             10
                   1520  LN_3       EM_3   09-JUL-16  IT_PROG    3000        1518
                   1521  LN_4       EM_4   09-JUL-16  AD_ASST    4000        1518             10

        }
        =

        GROUP Where {
        =============

            Where
            -----


        }
        =

    OUTPUTS
    =======

        GROUP Select results: Actual = 2, Expected = 2 {
        ================================================

            F?  Name  Department      Manager  Salary  Salary Ratio (dep)  Salary Ratio (overall)
            --  ----  --------------  -------  ------  ------------------  ----------------------
                LN_1  Administration             1000                 .67                      .4
                LN_2  Administration  LN_1       2000                1.33                      .8

        } 0 failed, of 2: SUCCESS
        =========================

} 0 failed, of 2: SUCCESS
=========================

SCENARIO 2: DS-2, testing same as 1 but with extra emp in another dep {
=======================================================================

    INPUTS
    ======

        GROUP Employee {
        ================

            Employee Id  Last Name  Email  Hire Date  Job      Salary  Manager Id  department Id
            -----------  ---------  -----  ---------  -------  ------  ----------  -------------
                   1522  LN_1       EM_1   09-JUL-16  IT_PROG    1000                         10
                   1523  LN_2       EM_2   09-JUL-16  IT_PROG    2000        1522             10
                   1524  LN_3       EM_3   09-JUL-16  IT_PROG    3000        1522
                   1525  LN_4       EM_4   09-JUL-16  AD_ASST    4000        1522             10
                   1526  LN_5       EM_5   09-JUL-16  IT_PROG    5000        1522             20

        }
        =

        GROUP Where {
        =============

            Where
            -----


        }
        =

    OUTPUTS
    =======

        GROUP Select results: Actual = 3, Expected = 3 {
        ================================================

            F?  Name  Department      Manager  Salary  Salary Ratio (dep)  Salary Ratio (overall)
            --  ----  --------------  -------  ------  ------------------  ----------------------
                LN_1  Administration             1000                 .67                     .33
                LN_2  Administration  LN_1       2000                1.33                     .67
                LN_5  Marketing       LN_1       5000                   1                    1.67

        } 0 failed, of 3: SUCCESS
        =========================

} 0 failed, of 3: SUCCESS
=========================

SCENARIO 3: DS-2, passing 'WHERE dep=10' {
==========================================

    INPUTS
    ======

        GROUP Employee {
        ================

            Employee Id  Last Name  Email  Hire Date  Job      Salary  Manager Id  department Id
            -----------  ---------  -----  ---------  -------  ------  ----------  -------------
                   1527  LN_1       EM_1   09-JUL-16  IT_PROG    1000                         10
                   1528  LN_2       EM_2   09-JUL-16  IT_PROG    2000        1527             10
                   1529  LN_3       EM_3   09-JUL-16  IT_PROG    3000        1527
                   1530  LN_4       EM_4   09-JUL-16  AD_ASST    4000        1527             10
                   1531  LN_5       EM_5   09-JUL-16  IT_PROG    5000        1527             20

        }
        =

        GROUP Where {
        =============

            Where
            --------------------------------
            department_name='Administration'

        }
        =

    OUTPUTS
    =======

        GROUP Select results: Actual = 2, Expected = 2 {
        ================================================

            F?  Name  Department      Manager  Salary  Salary Ratio (dep)  Salary Ratio (overall)
            --  ----  --------------  -------  ------  ------------------  ----------------------
                LN_1  Administration             1000                 .67                     .33
                LN_2  Administration  LN_1       2000                1.33                     .67

        } 0 failed, of 2: SUCCESS
        =========================

} 0 failed, of 2: SUCCESS
=========================

SCENARIO 4: DS-3, Salaries total 1500 (< threshold of 1600) {
=============================================================

    INPUTS
    ======

        GROUP Employee {
        ================

            Employee Id  Last Name  Email  Hire Date  Job      Salary  Manager Id  department Id
            -----------  ---------  -----  ---------  -------  ------  ----------  -------------
                   1532  LN_1       EM_1   09-JUL-16  IT_PROG     100                         10
                   1533  LN_2       EM_2   09-JUL-16  IT_PROG     200        1532             10
                   1534  LN_3       EM_3   09-JUL-16  IT_PROG     300        1532
                   1535  LN_4       EM_4   09-JUL-16  AD_ASST     400        1532             10
                   1536  LN_5       EM_5   09-JUL-16  IT_PROG     500        1532             20

        }
        =

        GROUP Where {
        =============

            Where
            -----


        }
        =

    OUTPUTS
    =======

        GROUP Select results: Actual = 0, Expected = 0: SUCCESS
        =======================================================

} 0 failed, of 1: SUCCESS
=========================

TIMING: Actual = 48, Expected <= 1: FAILURE
===========================================

SUMMARY for TT_View_Drivers.tt_HR_Test_View_V
=============================================

Scenario                                                                           # Failed  # Tests  Status
---------------------------------------------------------------------------------  --------  -------  -------
DS-1, testing inner, outer joins, analytic over dep, and global ratios with 1 dep         0        2  SUCCESS
DS-2, testing same as 1 but with extra emp in another dep                                 0        3  SUCCESS
DS-2, passing 'WHERE dep=10'                                                              0        2  SUCCESS
DS-3, Salaries total 1500 (< threshold of 1600)                                           0        1  SUCCESS
Timing                                                                                    1        1  FAILURE
---------------------------------------------------------------------------------  --------  -------  -------
Total                                                                                     1        9  FAILURE
---------------------------------------------------------------------------------  --------  -------  -------

Timer Set: TT_View_Drivers.tt_HR_Test_View_V, Constructed at 09 Jul 2016 13:32:42, written at 13:32:42
======================================================================================================
[Timer timed: Elapsed (per call): 0.01 (0.000013), CPU (per call): 0.02 (0.000020), calls: 1000, '***' denotes corrected line below]

Timer       Elapsed         CPU         Calls       Ela/Call       CPU/Call
-------  ----------  ----------  ------------  -------------  -------------
Setup          0.11        0.00             4        0.02675        0.00000
Caller         0.19        0.06             4        0.04750        0.01500
(Other)        0.03        0.03             1        0.02700        0.03000
-------  ----------  ----------  ------------  -------------  -------------
Total          0.32        0.09             9        0.03600        0.01000
-------  ----------  ----------  ------------  -------------  -------------

Notes on results section

  • In a view test there is only one group, namely the selected data set

The second part of the article is here: Design Patterns for Database Unit Testing 2: Views 2 - Code