Knapsacks and Networks in SQL

I opened a GitHub account, Brendan's GitHub Page last year and have added a number of projects since then, in PL/SQL and other 3GL languages. Partly in response to a request for the code for one of my blog articles on an interesting SQL problem, I decided recently to create a new repo for the SQL behind a group of articles on solving difficult combinatorial optimisation problems via 'advanced' SQL techniques such as recursive subquery factoring and model clause, sql_demos - Brendan's repo for interesting SQL. It includes installation scripts with object creation and data setup, and scripts to run the SQL on the included datasets. The idea is that anyone with the pre-requisites should be able to reproduce my results within a few minutes of downloading the repo.

[Left image from Knapsack problem; right image copied from Chapter 11 Dynamic Programming]

In this article I embed each of the earlier articles relevant to the GitHub repo with a brief preamble.

The first two articles are from January 2013 and use recursive subquery factoring to find exact solutions for the single and multiple knapsack problem, and also include PL/SQL solutions for comparison. They avoid the 'brute force' approach by truncating search paths as soon as limit constraints are exceeded. The cumulative paths are stored in string variables passed through the iterations (which would not be possible with the older Connect By hierarchical syntax).

In these articles I illustrate the nature of the problems using Visio diagrams, and include dimensional performance benchmarking results, using a technique that I presented on at last year's Ireland OUG conference: Dimensional Performance Benchmarking of SQL - IOUG Presentation. I also illustrate the queries using my own method for diagramming SQL queries.

A Simple SQL Solution for the Knapsack Problem (SKP-1), January 2013

An SQL Solution for the Multiple Knapsack Problem (SKP-m), January 2013

The next article uses Model clause to find a more general solution to a problem posed on AskTom, as a 'bin fitting' problem. I also solved the problem by other methods including recursive subquery factoring. I illustrate the problem itself, as well as the Model iteration scheme using Visio diagrams, and again include dimensional performance benchmarking. The results show how quadratic performance variation can be turned into much faster linear variation by means of a temporary table in this kind of problem.

SQL for the Balanced Number Partitioning Problem, May 2013

This article arose from a question on OTN, and concerns a type of knapsack or bin-fitting problem that is quite tricky to solve in SQL, where the items fall into categories on which there are separate constraints. I introduced a new idea here, to filter out unpromising paths within recursive subquery factoring by means of analytic functions, in order to allow the technique to be used to generate solutions for larger problems without guaranteed optimality, but in shorter time. Two realistic datasets were used, one from the original poster, and another I got from a scraping website.

SQL for the Fantasy Football Knapsack Problem, June 2013

This article is on a classic 'hard' optimisation problem, and uses recursive subquery factoring with the same filtering technique as the previous article, and shows that it's possible to solve a problem involving 312 American cities quite quickly in pure SQL using the approximation technique. It also uses a simple made-up example dataset to illustrate its working.

SQL for the Travelling Salesman Problem, July 2013

The following two articles concern finding shortest paths between given nodes in a network, and arose from a question on OTN. The first one again uses recursive subquery factoring with a filtering mechanism to exclude paths as early as possible, in a similar way to the approximative solutios methods in the earlier articles. In this case, however, reasoning about the nature of the problem shows that we are not in fact sacrificing optimality. The article has quite a lot of explanatory material on how the SQL works, and uses small dataset examples.

The second article considers how to improve performance further by obtaining a preliminary approximate solution that can be used as a bounding mechanism in a second step to find the exact solutions. This article uses two realistic networks as examples, including one having 428,156 links.

SQL for Shortest Path Problems, April 2015

SQL for Shortest Path Problems 2: A Branch and Bound Approach, May 2015

In the article above I cited results from a general network analysis package I had developed that obtains all the distinct connected subnetworks with their structures in an efficient manner using PL/SQL recursion. It is worth noting that for that kind of problem recursive SQL alone is very inefficient, and I wrote the following article to try to explain why that is so, and why the Connect By syntax is generally much worse than recursive subquery factoring.

Recursive SQL for Network Analysis, and Duality, September 2015

The PL/SQL package mentioned, which I think implements a 'named' algorithm although I didn't know that when I wrote it (I don't recall the name right now, sorry 🙁 ), is available on GitHub: Brendan's network structural analysis Oracle package, with article:

PL/SQL Pipelined Function for Network Analysis, May 2015






A Generic Unix Script for Uploading eBusiness Concurrent Programs

I have posted a couple of articles recently on XML Publisher report development within Oracle eBusiness applications (A Design Pattern for Oracle eBusiness Audit Trail Reports with XML Publisher and Design Patterns for Database Reports with XML Publisher and Email Bursting). These reports are of one of several types of batch program, or concurrent program that can be defined within Oracle eBusiness.

Oracle eBusiness uses a number of metadata tables to store information on the programs, and in release 11.5 Oracle introduced a Unix utility called FNDLOAD to download the metadata to formatted text files to allow them to be uploaded via the same utility into downstream environments. At that time batch reports were generally developed in the Oracle Reports tool and typically there might only be two text files (known as LDT files after their extension), for the program and for associated value sets, and maybe one for request groups (which control access to the reports). The executable report file, the RDF, would just be copied to the target environment server directory. I wrote a set of wrapper scripts for the utility to streamline its use and to deal with a number of issues, including handling of audit fields, with a structure consisting of a separate pair of scripts for download and upload of each type of LDT file. I published these on Scribd in July 2009.

In working more recently with Oracle's successor reporting tool, XML Publisher, I found that the number of objects involved in installation has increased substantially. As well as the LDT files there are also now XML configuration files and RTF (usually) templates, uploaded via a Java utility, XDOLoader. Installation also involves at least one PL/SQL package. For example the email version of my model reports (see the second link above) had 11 configuration files. For this reason, I decided to create a single script that would copy, upload and install all objects required for a concurrent program of any type, and I describe the script in this article.

Here is my original Scribd article, describing the issues mentioned, and with the original individual upload and download scripts.

Loading...

The new script is here XX_Install_XMLCP_ksh, and here is the MD120 from the model article that uses it.

Loading...

Directory Structure

The script operates on an input TAR file containing all the necessary files. The TAR file is called prog.tar after the program short name prog, and contains a directory also called prog with all the installation files. The installer uses relative paths and assumes that the TAR file, with the installer, is placed in a directory below the custom top directory, say $XX_TOP/rel. All loader files are copied to $XX_TOP/import and SQL files to $XX_TOP/install/sql.

After installation, a new directory will remain with all the installation files for reference, $XX_TOP/rel/prog.

Program Structure

The internal call structure of the Unix script is shown below.
Install CP - CSD

The script operates on an input TAR file containing all the necessary files.

After extracting the TAR file, the script has local subroutines that can be divided into four categories, as above:

  1. Preliminaries - parameter processing, file moving and validation
  2. FNDLOAD uploads - uploading the LDT files
  3. XDOLoader uploads - uploading the data and any bursting XML files, and all layout templates present
  4. SQL installations - installing any SQL script present, and the package spec and body files

The following table gives a few notes on the main program and preliminary subroutines.
Preliminaries Summary
Prelims Table
The following table gives a few notes on the upload and SQL subroutines.
Upload and SQL Summary
Install Uploads Table
The upload subroutines all have SQL queries for verification, and here is a sample log file from the script: XX_ERPXMLCP_EM_log

The remainder of the article lists the queries with diagrams and examples of output.

Upload Subroutines

upload_ag

Validation Query

SELECT app_g.application_short_name "App G", fag.group_name "Group",
      fag.description "Description",
      app_t.application_short_name "App T", ftb.table_name "Table",
      fcl.column_name "Column"
  FROM fnd_audit_groups fag
  JOIN fnd_application app_g
    ON app_g.application_id = fag.application_id
  JOIN fnd_audit_tables fat
    ON fat.audit_group_app_id = fag.application_id
   AND fat.audit_group_id = fag.audit_group_id
  JOIN fnd_application app_t
    ON app_t.application_id = fat.table_app_id
  JOIN fnd_tables ftb
    ON ftb.application_id = fat.table_app_id
   AND ftb.table_id = fat.table_id
  JOIN fnd_audit_columns fac
    ON fac.table_app_id = fat.table_app_id
   AND fac.table_id = fat.table_id
  JOIN fnd_columns fcl
    ON fcl.application_id = fac.table_app_id
   AND fcl.table_id = fac.table_id
   AND fcl.column_id = fac.column_id
 WHERE fag.last_update_date     = To_Date ('$sysdate', 'YYYY/MM/DD')
   AND fac.schema_id            = 900
ORDER BY app_g.application_short_name, fag.group_name, 
         app_t.application_short_name, ftb.table_name,
         fcl.column_name;

QSD

Install CP - AG - 1

Example Output
No example available here, but the headings are:
"App G", "Group", "Description", "App T", "Table", "Column"

upload_ms

Validation Query

SELECT mes.message_name "Name", mes.message_text "Text"
  FROM fnd_new_messages mes
 WHERE mes.last_update_date     = To_Date ('$sysdate', 'YYYY/MM/DD')
 ORDER BY 1;

QSD

Install CP - Message

Example Output

Name                     Text
--------------------- ---------------------------------------------
XX_ERPXMLCP_EM_NONXML Non-XML Concurrent Program &PROGAPP

upload_vs

Validation Query

SELECT fvs.flex_value_set_name "Value Set", Count(fvl.flex_value_set_id) "Values"
  FROM fnd_flex_value_sets fvs, fnd_flex_values fvl
 WHERE fvs.last_update_date     = To_Date ('$sysdate', 'YYYY/MM/DD')
   AND fvl.flex_value_set_id(+) = fvs.flex_value_set_id
 GROUP BY fvs.flex_value_set_name;

QSD

Install CP - VS

Example Output

Value Set             Values
--------------------- ------
XX_PROGS                   0
XX_APPNAME_ID              0

upload_cp

Validation Query

SELECT prg.user_concurrent_program_name || ': ' || prg.concurrent_program_name "Program", fcu.column_seq_num || ': ' || fcu.end_user_column_name "Parameter"
  FROM fnd_concurrent_programs_vl               prg
  LEFT JOIN fnd_descr_flex_column_usages      fcu
    ON fcu.descriptive_flexfield_name         = '\$SRS\$.' || prg.concurrent_program_name
   AND fcu.descriptive_flex_context_code      = 'Global Data Elements'
 WHERE prg.concurrent_program_name              = '$cp_name'
 ORDER BY 1, 2;

QSD

Install CP - CP

Example Output

Program                                    Parameter
------------------------------------------ ------------------------
XX Example XML CP (Email): XX_ERPXMLCP_EM  100: Cc Email
                                           10: Application
                                           20: From Program
                                           30: To Program
                                           40: From Date
                                           50: To Date
                                           60: From Parameter Count
                                           70: To Parameter Count
                                           80: Override Email
                                           90: From Email

upload_rga

Validation Query

SELECT rgp.request_group_name "Request Group",
       app.application_short_name "App"
  FROM fnd_concurrent_programs          cpr
  JOIN fnd_request_group_units          rgu
    ON rgu.unit_application_id          = cpr.application_id
   AND rgu.request_unit_id              = cpr.concurrent_program_id
  JOIN fnd_request_groups               rgp
    ON rgp.application_id               = rgu.application_id
   AND rgp.request_group_id             = rgu.request_group_id
  JOIN fnd_application                  app
    ON app.application_id               = rgp.application_id
 WHERE cpr.concurrent_program_name      = '$cp_name'
 ORDER BY 1;

QSD

Install CP - RGA

Example Output

Request Group                  App
------------------------------ ----------
System Administrator Reports   FND

upload_dd

Validation Query

SELECT xdd.data_source_code "Code", xtm.default_language "Lang", xtm.default_territory "Terr"
  FROM xdo_ds_definitions_b xdd
  LEFT JOIN xdo_templates_b xtm
    ON xtm.application_short_name       = xdd.application_short_name
   AND xtm.data_source_code             = xdd.data_source_code
 WHERE xdd.data_source_code             = '$cp_name'
 ORDER BY 1, 2, 3;

QSD

Install CP - DD

Example Output

Code                 Lang Terr
-------------------- ---- ----
XX_ERPXMLCP_EM       en   US

upload_all_temps

Validation Query

SELECT xdd.data_source_code "Code", 
        xlb_d.file_name "Data Template",
        xlb_b.file_name "Bursting File",
        xtm.template_code "Template",
        xlb.language "Lang", 
        xlb.territory "Terr",
        xlb.file_name || 
               CASE
               WHEN xlb.language = xtm.default_language AND
                    xlb.territory = xtm.default_territory
               THEN '*' END "File"
  FROM xdo_ds_definitions_b             xdd
  LEFT JOIN xdo_lobs                    xlb_d
    ON xlb_d.application_short_name     = xdd.application_short_name
   AND xlb_d.lob_code                   = xdd.data_source_code
   AND xlb_d.lob_type                   = 'DATA_TEMPLATE'
  LEFT JOIN xdo_lobs                    xlb_b
    ON xlb_b.application_short_name     = xdd.application_short_name
   AND xlb_b.lob_code                   = xdd.data_source_code
   AND xlb_b.lob_type                   = 'BURSTING_FILE'
  LEFT JOIN xdo_templates_b             xtm
    ON xtm.application_short_name       = xdd.application_short_name
   AND xtm.data_source_code             = xdd.data_source_code
  LEFT JOIN xdo_lobs                    xlb
    ON xlb.application_short_name       = xtm.application_short_name
   AND xlb.lob_code                     = xtm.template_code
   AND xlb.lob_type                     LIKE 'TEMPLATE%'
 WHERE xdd.data_source_code             = '$cp_name'
   AND xdd.application_short_name       = '$app'
 ORDER BY 1, 2, 3, 4;

QSD

Install CP - Template

Example Output

Code            Data Template        Bursting File          Template             Lang Terr File
--------------- -------------------- ---------------------- -------------------- ---- ---- -------------------------
XX_ERPXMLCP_EM  XX_ERPXMLCP_EM.xml   XX_ERPXMLCP_EM_BUR.xml XX_ERPXMLCP_EM       en   US   XX_ERPXMLCP_EM.xsl*
                                                                                           XX_ERPXMLCP_EM.rtf*
                                                            XX_ERPXMLCP_EM_XML   en   US   XX_ERPXMLCP_EM_XML.xsl*
                                                                                           XX_ERPXMLCP_EM_XML.rtf*






Query Query Query

In my last post, A Design Pattern for Oracle eBusiness Audit Trail Reports with XML Publisher, I described a database report module developed in Oracle's XML Publisher tool. Of the report structure I wrote:

It has a master entity with two independent detail entities, and therefore requires a minimum of two queries.

But why does such a structure require two queries? And can we determine the minimum number of queries for reports in general? To start with, let's define a report in this context as being a hierarchy of record groups, where:

  • a record group is a set of records having the same columns with (possibly) differing values
  • each group is linked to a single record in its (single) parent group by values in the parent record, except the top level (or root) group

For example, in the earlier post the root group is a set of bank accounts, with the two detail (or child) groups being the set of owners of the bank account and the set of audit records for the bank account parent record. Corresponding to this group structure, each bank account record is the root of the data hierarchies, comprising two sets of records below the bank account record, one for the owners and one for the audit records linked to the root record by the bank account id.

A (relational) query always returns a flat record set, and it's this fact that determines the minimum number of queries required for a given group structure. A master-detail group structure can be flattened in the query by simply copying master fields on to the child record sets. The set cardinality is then the cardinality of the child set. The report designer uses their chosen reporting tool to specify display of the queried data in either flat, or in master-detail format.

In fact this approach works for any number of child levels, with the query cardinality being the number of bottom level descendants (using null records for potential parents that are in fact childless). It's clear though that the approach will not work for any second child at the same level because there would be two cardinalities and no meaningful single record for both child groups could be constructed within a flat query.

This reasoning leads to the conclusion that the minimum number of queries required in general is equal to the number of groups minus the number of parent groups.

Query Query Query - example

In the earlier post I also stated:

This minimum number of queries is usually the best choice...

There are two main reasons for this:

  • each child query fires for every record returned by its parent, with associated performance impact
  • maintenance tends to be more difficult with extra queries; this is much worse when the individual groups, which should almost always be implemented by a maximum of one query each, are split, and then need to be joined back together procedurally

On thinking about this, it occurred to me that if the group structure were defined in a metadata table we might be able to return minimum query structures using an SQL query. Just one, obviously 🙂 . To save effort we could use Oracle's handy HR demo schema with the employee hierarchy representing groups.

The remainder of this article describes the query I came up with. As it's about hierarchies, recursion is the technique to use, and this is one of those cases where Oracle's old tree-walk syntax is too limited, so I am using the Oracle 11.2 recursive subquery factoring feature.

The query isn't going to be of practical value for report group structures since these are always quite small in size, but I expect there are different applications where this kind of Primogeniture Recursion would be useful.

Query Groups Query - Primogeniture Recursion

Query Structure Diagram
Query Query Query

SQL

WITH rsf (last_name, employee_id, lev, part_id, manager_id) AS (
SELECT last_name, employee_id, 0, employee_id, To_Number(NULL)
  FROM employees
 WHERE manager_id IS NULL
UNION ALL
SELECT e.last_name, e.employee_id, r.lev + 1, 
       CASE WHEN Row_Number() OVER (PARTITION BY r.employee_id ORDER BY e.last_name) = 1 THEN r.part_id ELSE e.employee_id END,
       e.manager_id
  FROM rsf r
  JOIN employees e
    ON e.manager_id = r.employee_id
)
SELECT part_id, LPad ('.', lev) || last_name last_name, employee_id, 
       Count(DISTINCT part_id) OVER () "#Partitions",
       Count(DISTINCT manager_id) OVER () "+ #Parents",
       Count(*) OVER () "= #Records"
  FROM rsf
 ORDER BY part_id, lev, last_name

Query Output

   PART_ID LAST_NAME            EMPLOYEE_ID #Partitions + #Parents = #Records
---------- -------------------- ----------- ----------- ---------- ----------
       100 King                         100          89         18        107
           .Cambrault                   148          89         18        107
            .Bates                      172          89         18        107
       101 .Kochhar                     101          89         18        107
            .Baer                       204          89         18        107
       102 .De Haan                     102          89         18        107
            .Hunold                     103          89         18        107
             .Austin                    105          89         18        107
       104   .Ernst                     104          89         18        107
       106   .Pataballa                 106          89         18        107
       107   .Lorentz                   107          89         18        107
       108  .Greenberg                  108          89         18        107
             .Chen                      110          89         18        107
       109   .Faviet                    109          89         18        107
       111   .Sciarra                   111          89         18        107
       112   .Urman                     112          89         18        107
       113   .Popp                      113          89         18        107
       114 .Raphaely                    114          89         18        107
            .Baida                      116          89         18        107
       115  .Khoo                       115          89         18        107
       117  .Tobias                     117          89         18        107
       118  .Himuro                     118          89         18        107
       119  .Colmenares                 119          89         18        107
       120 .Weiss                       120          89         18        107
            .Fleaur                     181          89         18        107
       121 .Fripp                       121          89         18        107
            .Atkinson                   130          89         18        107
       122 .Kaufling                    122          89         18        107
            .Chung                      188          89         18        107
       123 .Vollman                     123          89         18        107
            .Bell                       192          89         18        107
       124 .Mourgos                     124          89         18        107
            .Davies                     142          89         18        107
       125  .Nayer                      125          89         18        107
       126  .Mikkilineni                126          89         18        107
       127  .Landry                     127          89         18        107
       128  .Markle                     128          89         18        107
       129  .Bissot                     129          89         18        107
       131  .Marlow                     131          89         18        107
       132  .Olson                      132          89         18        107
       133  .Mallin                     133          89         18        107
       134  .Rogers                     134          89         18        107
       135  .Gee                        135          89         18        107
       136  .Philtanker                 136          89         18        107
       137  .Ladwig                     137          89         18        107
       138  .Stiles                     138          89         18        107
       139  .Seo                        139          89         18        107
       140  .Patel                      140          89         18        107
       141  .Rajs                       141          89         18        107
       143  .Matos                      143          89         18        107
       144  .Vargas                     144          89         18        107
       145 .Russell                     145          89         18        107
            .Bernstein                  151          89         18        107
       146 .Partners                    146          89         18        107
            .Doran                      160          89         18        107
       147 .Errazuriz                   147          89         18        107
            .Ande                       166          89         18        107
       149 .Zlotkey                     149          89         18        107
            .Abel                       174          89         18        107
       150  .Tucker                     150          89         18        107
       152  .Hall                       152          89         18        107
       153  .Olsen                      153          89         18        107
       154  .Cambrault                  154          89         18        107
       155  .Tuvault                    155          89         18        107
       156  .King                       156          89         18        107
       157  .Sully                      157          89         18        107
       158  .McEwen                     158          89         18        107
       159  .Smith                      159          89         18        107
       161  .Sewall                     161          89         18        107
       162  .Vishney                    162          89         18        107
       163  .Greene                     163          89         18        107
       164  .Marvins                    164          89         18        107
       165  .Lee                        165          89         18        107
       167  .Banda                      167          89         18        107
       168  .Ozer                       168          89         18        107
       169  .Bloom                      169          89         18        107
       170  .Fox                        170          89         18        107
       171  .Smith                      171          89         18        107
       173  .Kumar                      173          89         18        107
       175  .Hutton                     175          89         18        107
       176  .Taylor                     176          89         18        107
       177  .Livingston                 177          89         18        107
       178  .Grant                      178          89         18        107
       179  .Johnson                    179          89         18        107
       180  .Taylor                     180          89         18        107
       182  .Sullivan                   182          89         18        107
       183  .Geoni                      183          89         18        107
       184  .Sarchand                   184          89         18        107
       185  .Bull                       185          89         18        107
       186  .Dellinger                  186          89         18        107
       187  .Cabrio                     187          89         18        107
       189  .Dilly                      189          89         18        107
       190  .Gates                      190          89         18        107
       191  .Perkins                    191          89         18        107
       193  .Everett                    193          89         18        107
       194  .McCain                     194          89         18        107
       195  .Jones                      195          89         18        107
       196  .Walsh                      196          89         18        107
       197  .Feeney                     197          89         18        107
       198  .OConnell                   198          89         18        107
       199  .Grant                      199          89         18        107
       200  .Whalen                     200          89         18        107
       201 .Hartstein                   201          89         18        107
            .Fay                        202          89         18        107
       203  .Mavris                     203          89         18        107
       205  .Higgins                    205          89         18        107
             .Gietz                     206          89         18        107

107 rows selected.

 






A Design Pattern for Oracle eBusiness Audit Trail Reports with XML Publisher

Oracle eBusiness applications allow audit history records to be automatically maintained on database tables, as explained in the release 12 System Administrator's guide, Reporting On AuditTrail Data.

Oracle E-Business Suite provides an auditing mechanism based on Oracle database triggers. AuditTrail stores change information in a "shadow table" of the audited table. This mechanism saves audit data in an uncompressed but "sparse" format, and you enable auditing for particular tables and groups of tables ("audit groups").

Oracle provides an Audit Query Navigator page where it is possible to search for changes by primary key values. For reporting purposes, the manual says:

You should write audit reports as needed. Audit Trail provides the views of your shadow tables to make audit reporting easier; you can write your reports to use these views.

In fact the views are of little practical use, and it is quite hard to develop reports that are user-friendly, efficient and not over-complex, owing, amongst other things, to the "sparse" data format. However, once you have developed one you can use that as a design pattern and starting point for audit reporting on any of the eBusiness tables.

In this article I provide such a report for auditing external bank account changes on Oracle eBusiness 12.1. The report displays the current record, with lookup information, followed by a list of the changes within an input date range. Only records that have changes within that range are included, and for each change only the fields that were changed are listed. The lookup information includes a list of detail records, making the report overall pretty general in structure: It has a master entity with two independent detail entities, and therefore requires a minimum of two queries. This minimum number of queries is usually the best choice and is what I have implemented (it's fine to have an extra query for global data, but I don't have any here). The main query makes extensive use of analytic functions, case expressions and subquery factors to achieve the necessary data transformations as simply and efficiently as possible.

The report is implemented in XML (or BI) Publisher, which is the main batch reporting tool for Oracle eBusiness.

I start by showing sample output from the report, followed by the RTF template. The queries are then documented, starting with query structure diagrams with annotations explaining the logic. A link is included to a zip file with all the code and templates needed to install the report. Oracle provides extensive documentation on the setup of Auditing and developing in XML Publisher, so I will not cover this.

Report Layout
Example Output in Excel Format

  • There are three regions
    • Bank Account Current Record - the master record
    • Owners - first detail block, listing the owners of the bank account
    • Bank Account Changes - the second detail block, listing the audit history. Note that unchanged fields are omitted
  • Note that some audit fields are displayed directly, such as account number, while for others, such as branch number, the display value is on a referenced table

XML Publisher RTF Tempate

  • Note that each audit field has its own row in the table, but the if-block excludes it if both old and new values are null

Audit Query
Query Structure Diagram
Audit Query QSD
Subquery Tabulation

SQL

WITH audit_range AS (
SELECT DISTINCT ext_bank_account_id
  FROM iby_ext_bank_accounts_a aup
 WHERE 1=1
&lp_beg_dat
&lp_end_dat
), audit_union AS (
SELECT ext_bank_account_id acc_id,
       CASE WHEN Substr (audit_true_nulls, 2, 1) = 'Y' OR audit_transaction_type = 'I' THEN '*NULL*' ELSE bank_account_name END acc_name,
       CASE WHEN Substr (audit_true_nulls, 3, 1) = 'Y' OR audit_transaction_type = 'I' THEN '*NULL*' ELSE bank_account_num END acc_num,
       CASE WHEN Substr (audit_true_nulls, 8, 1) = 'Y' OR audit_transaction_type = 'I' THEN '*NULL*' ELSE iban END iban,
       CASE WHEN Substr (audit_true_nulls, 7, 1) = 'Y' OR audit_transaction_type = 'I' THEN '*NULL*' ELSE currency_code END curr,
       CASE WHEN Substr (audit_true_nulls, 6, 1) = 'Y' OR audit_transaction_type = 'I' THEN '*NULL*' ELSE country_code END coun,
       CASE WHEN Substr (audit_true_nulls, 4, 1) = 'Y' OR audit_transaction_type = 'I' THEN 0 ELSE bank_id END bank_id,
       CASE WHEN Substr (audit_true_nulls, 5, 1) = 'Y' OR audit_transaction_type = 'I' THEN 0 ELSE branch_id END branch_id,
       audit_sequence_id seq_id,
       audit_timestamp,
       audit_user_name a_user,
       CASE WHEN audit_transaction_type = 'I' THEN 'INSERT' ELSE 'UPDATE' END a_type
  FROM iby_ext_bank_accounts_a aup
 WHERE aup.ext_bank_account_id IN (SELECT ext_bank_account_id FROM audit_range)
&lp_beg_dat
 UNION
SELECT bac.ext_bank_account_id,
       bac.bank_account_name,
       bac.bank_account_num,
       bac.iban,
       bac.currency_code,
       bac.country_code,
       bac.bank_id,
       bac.branch_id,
       NULL,
       bac.last_update_date,
       usr.user_name,
       NULL
  FROM iby_ext_bank_accounts            bac
  JOIN fnd_user                         usr
    ON usr.user_id                      = bac.last_updated_by
 WHERE bac.ext_bank_account_id IN (SELECT ext_bank_account_id FROM audit_range)
), audit_pairs AS (
SELECT acc_id,
       acc_name,
       bank_id,
       First_Value (bank_id IGNORE NULLS) OVER 
        (PARTITION BY acc_id ORDER BY audit_timestamp, seq_id ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) bank_id_n,
       branch_id,
       First_Value (branch_id IGNORE NULLS) OVER 
        (PARTITION BY acc_id ORDER BY audit_timestamp, seq_id ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) branch_id_n,
       First_Value (acc_name IGNORE NULLS) OVER 
        (PARTITION BY acc_id ORDER BY audit_timestamp, seq_id ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) acc_name_n,
       acc_num,
       First_Value (acc_num IGNORE NULLS) OVER 
        (PARTITION BY acc_id ORDER BY audit_timestamp, seq_id ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) acc_num_n,
       iban,
       First_Value (iban IGNORE NULLS) OVER 
        (PARTITION BY acc_id ORDER BY audit_timestamp, seq_id ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) iban_n,
       curr,
       First_Value (curr IGNORE NULLS) OVER 
        (PARTITION BY acc_id ORDER BY audit_timestamp, seq_id ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) curr_n,
       coun,
       First_Value (iban IGNORE NULLS) OVER 
        (PARTITION BY acc_id ORDER BY audit_timestamp, seq_id ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) coun_n,
       seq_id,
       audit_timestamp,
       a_user,
       a_type
  FROM audit_union
)
SELECT aup.acc_id,
       par_bnk.party_name bank_name,
       par_brn.party_name bra_name,
       orp.bank_or_branch_number bra_num,
       bac.bank_account_name acc_name,
       bac.bank_account_num acc_num,
       bac.iban,
       bac.country_code coun,
       bac.currency_code curr,
       To_Char (bac.creation_date, 'DD-MON-YYYY HH24:MI:SS') creation_date,
       usr.user_name created_by,
       To_Char (aup.audit_timestamp, 'DD-MON-YYYY HH24:MI:SS') a_time,
       aup.a_user,
       aup.a_type,
/*
attr: 1. NULL -> no change; 2. 0/'*NULL*' -> change from null; 3. 'other'-> change from not null
        old: 1 and 2 both return NULL; 3 returns old not null value
        new: only return value for 2 and 3, meaning some change
*/
       CASE WHEN aup.bank_id != 0 THEN par_bnk_o.party_name END bank_name_o,
       CASE WHEN aup.bank_id IS NOT NULL THEN CASE WHEN aup.bank_id_n != 0 THEN par_bnk_o.party_name END END bank_name_n,
       CASE WHEN aup.branch_id != 0 THEN par_brn_o.party_name END bra_name_o,
       CASE WHEN aup.branch_id IS NOT NULL THEN CASE WHEN aup.branch_id_n != 0 THEN par_brn_n.party_name END END bra_name_n,
       CASE WHEN aup.branch_id != 0 THEN orp_o.bank_or_branch_number END bra_num_o,
       CASE WHEN aup.branch_id IS NOT NULL THEN CASE WHEN aup.branch_id_n != 0 THEN orp_n.bank_or_branch_number END END bra_num_n,
       CASE WHEN aup.acc_name != '*NULL*' THEN aup.acc_name END acc_name_o,
       CASE WHEN aup.acc_name IS NOT NULL THEN CASE WHEN aup.acc_name_n != '*NULL*' THEN aup.acc_name_n END END acc_name_n,
       CASE WHEN aup.acc_num != '*NULL*' THEN aup.acc_num END acc_num_o,
       CASE WHEN aup.acc_num IS NOT NULL THEN CASE WHEN aup.acc_num_n != '*NULL*' THEN aup.acc_num_n END END acc_num_n,
       CASE WHEN aup.iban != '*NULL*' THEN aup.iban END iban_o,
       CASE WHEN aup.iban IS NOT NULL THEN CASE WHEN aup.iban_n != '*NULL*' THEN aup.iban_n END END iban_n,
       CASE WHEN aup.curr != '*NULL*' THEN aup.curr END curr_o,
       CASE WHEN aup.curr IS NOT NULL THEN CASE WHEN aup.curr_n != '*NULL*' THEN aup.curr_n END END curr_n,
       CASE WHEN aup.coun != '*NULL*' THEN aup.coun END coun_o,
       CASE WHEN aup.coun IS NOT NULL THEN CASE WHEN aup.coun_n != '*NULL*' THEN aup.coun_n END END coun_n
  FROM audit_pairs                      aup
  JOIN iby_ext_bank_accounts            bac
    ON bac.ext_bank_account_id          = aup.acc_id
  LEFT JOIN hz_parties                  par_bnk
    ON par_bnk.party_id                 = bac.bank_id
  LEFT JOIN hz_parties                  par_bnk_o
    ON par_bnk_o.party_id               = aup.bank_id
  LEFT JOIN hz_parties                  par_bnk_n
    ON par_bnk_n.party_id               = aup.bank_id_n
  LEFT JOIN hz_parties                  par_brn
    ON par_brn.party_id                 = bac.branch_id
  LEFT JOIN hz_organization_profiles    orp
    ON orp.party_id                     = par_brn.party_id
   AND SYSDATE BETWEEN Trunc (orp.effective_start_date) AND Nvl (Trunc (orp.effective_end_date), SYSDATE+1)
  LEFT JOIN hz_parties                  par_brn_o
    ON par_brn_o.party_id               = aup.branch_id
  LEFT JOIN hz_organization_profiles    orp_o
    ON orp_o.party_id                   = par_brn_o.party_id
   AND SYSDATE BETWEEN Trunc (orp_o.effective_start_date) AND Nvl (Trunc (orp_o.effective_end_date), SYSDATE+1)
  LEFT JOIN hz_parties                  par_brn_n
    ON par_brn_n.party_id               = aup.branch_id_n
  LEFT JOIN hz_organization_profiles    orp_n
    ON orp_n.party_id                   = par_brn_n.party_id
   AND SYSDATE BETWEEN Trunc (orp_n.effective_start_date) AND Nvl (Trunc (orp_n.effective_end_date), SYSDATE+1)
  JOIN fnd_user                         usr
    ON usr.user_id                      = bac.created_by
 WHERE aup.seq_id                       IS NOT NULL
&lp_beg_dat
&lp_end_dat
 ORDER BY aup.acc_id, aup.audit_timestamp DESC, aup.seq_id DESC

Owners Query
Query Structure Diagram
Owners Query
Subquery Tabulation

SQL

SELECT par.party_name owner,
       sup.vendor_name,
       sup.segment1
  FROM iby_account_owners               own
  JOIN hz_parties                       par
    ON par.party_id                     = own.account_owner_party_id
  LEFT JOIN ap_suppliers                sup
    ON sup.party_id                     = par.party_id
 WHERE own.ext_bank_account_id          = :ACC_ID

Code for Bank Account Auditing XML Publisher Report

XX_IBYBNKAUDIT

See also A Generic Unix Script for Uploading Oracle eBusiness Concurrent Programs






NoCOUG SQL Challenge 2014 Illustrated

An SQL challenge was posted recently on the blog of the Northern California Oracle user group, SQL Mini-challenge. A query was given against Oracle's demo HR schema, with description:

It lists the locations containing a department that either contains an employee named Steven King or an employee who holds the title of President or an employee who has previously held the title of President.

The challenge was to rewrite the query avoiding the relatively expensive existence subqueries, and minimising the number of consistent gets reported on the small demo data set. Oracle's Cost-Based Optimiser will itself transform queries within its parsing phase, but at a relatively low level; for example, an 'OR' condition might be changed to a union if the CBO thinks that will aid performance. The solutions to the challenge present a nice illustration of how more extensive query transfomations can improve performance and modularity characteristics.

In this article I will list four equivalent queries for the problem, three based on solutions provided on the blog, using Ansi syntax here for consistency. For each query I give the output from DBMS_XPlan, and include a query structure diagram following my own diagramming notation. The example query provided in the challenge is not in fact the most literal translation of the requirement into SQL, and I think it will be interesting to start with my idea of what that would be.

Update 28 October 2014: I noticed that in the 'literal' query I had omitted the location condition on the third subquery. I have fixed this and the execution plan is much worse. The results reported here were from version 11.2; running on v12.1 gives some extremely interesting differences, and I have added the v12.1 results at the end for the 'literal' query. They show a much improved plan, with departments 'factorised' out.

Query 1: Literal
This my attempt at the most literal translation of the stated requirement into SQL. The three conditions are all separately expressed as existence subqueries.

QSD Literal

NCOUG-Literal

Query Literal
Note that in an earlier version of this article, I had omitted the final 'd.location_id = l.location_id' condition.

SELECT l.location_id, l.city
  FROM locations l
 WHERE EXISTS
  (SELECT *
     FROM departments d
     JOIN employees e
       ON e.department_id = d.department_id
    WHERE d.location_id = l.location_id
      AND e.first_name = 'Steven' AND e.last_name = 'King'
  ) OR EXISTS
  (SELECT *
     FROM departments d
     JOIN employees e
       ON e.department_id = d.department_id
     JOIN jobs j
       ON j.job_id = e.job_id
    WHERE d.location_id = l.location_id
      AND j.job_title = 'President'
  ) OR EXISTS
  (SELECT *
     FROM departments d
     JOIN employees e
       ON e.department_id = d.department_id
     JOIN job_history h
       ON h.employee_id = e.employee_id
     JOIN jobs j2
       ON j2.job_id = h.job_id
    WHERE d.location_id = l.location_id
      AND j2.job_title   = 'President'
  )

XPlan Literal

-------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name              | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |                   |      1 |        |      1 |00:00:00.01 |     426 |       |       |          |
|*  1 |  FILTER                          |                   |      1 |        |      1 |00:00:00.01 |     426 |       |       |          |
|   2 |   VIEW                           | index$_join$_001  |      1 |     23 |     23 |00:00:00.01 |       7 |       |       |          |
|*  3 |    HASH JOIN                     |                   |      1 |        |     23 |00:00:00.01 |       7 |  1023K|  1023K| 1150K (0)|
|   4 |     INDEX FAST FULL SCAN         | LOC_CITY_IX       |      1 |     23 |     23 |00:00:00.01 |       3 |       |       |          |
|   5 |     INDEX FAST FULL SCAN         | LOC_ID_PK         |      1 |     23 |     23 |00:00:00.01 |       4 |       |       |          |
|   6 |   NESTED LOOPS                   |                   |     23 |        |      1 |00:00:00.01 |      92 |       |       |          |
|   7 |    NESTED LOOPS                  |                   |     23 |      1 |     23 |00:00:00.01 |      69 |       |       |          |
|   8 |     TABLE ACCESS BY INDEX ROWID  | EMPLOYEES         |     23 |      1 |     23 |00:00:00.01 |      46 |       |       |          |
|*  9 |      INDEX RANGE SCAN            | EMP_NAME_IX       |     23 |      1 |     23 |00:00:00.01 |      23 |       |       |          |
|* 10 |     INDEX UNIQUE SCAN            | DEPT_ID_PK        |     23 |      1 |     23 |00:00:00.01 |      23 |       |       |          |
|* 11 |    TABLE ACCESS BY INDEX ROWID   | DEPARTMENTS       |     23 |      1 |      1 |00:00:00.01 |      23 |       |       |          |
|  12 |   NESTED LOOPS                   |                   |     22 |        |      0 |00:00:00.01 |     173 |       |       |          |
|  13 |    NESTED LOOPS                  |                   |     22 |      1 |     88 |00:00:00.01 |     166 |       |       |          |
|  14 |     NESTED LOOPS                 |                   |     22 |      2 |      6 |00:00:00.01 |     160 |       |       |          |
|* 15 |      TABLE ACCESS FULL           | JOBS              |     22 |      1 |     22 |00:00:00.01 |     132 |       |       |          |
|  16 |      TABLE ACCESS BY INDEX ROWID | DEPARTMENTS       |     22 |      2 |      6 |00:00:00.01 |      28 |       |       |          |
|* 17 |       INDEX RANGE SCAN           | DEPT_LOCATION_IX  |     22 |      2 |      6 |00:00:00.01 |      22 |       |       |          |
|* 18 |     INDEX RANGE SCAN             | EMP_DEPARTMENT_IX |      6 |     10 |     88 |00:00:00.01 |       6 |       |       |          |
|* 19 |    TABLE ACCESS BY INDEX ROWID   | EMPLOYEES         |     88 |      1 |      0 |00:00:00.01 |       7 |       |       |          |
|  20 |   NESTED LOOPS                   |                   |     22 |        |      0 |00:00:00.01 |     154 |       |       |          |
|  21 |    NESTED LOOPS                  |                   |     22 |      1 |      0 |00:00:00.01 |     154 |       |       |          |
|  22 |     NESTED LOOPS                 |                   |     22 |      1 |      0 |00:00:00.01 |     154 |       |       |          |
|  23 |      NESTED LOOPS                |                   |     22 |      1 |      0 |00:00:00.01 |     154 |       |       |          |
|* 24 |       TABLE ACCESS FULL          | JOBS              |     22 |      1 |     22 |00:00:00.01 |     132 |       |       |          |
|  25 |       TABLE ACCESS BY INDEX ROWID| JOB_HISTORY       |     22 |      1 |      0 |00:00:00.01 |      22 |       |       |          |
|* 26 |        INDEX RANGE SCAN          | JHIST_JOB_IX      |     22 |      1 |      0 |00:00:00.01 |      22 |       |       |          |
|  27 |      TABLE ACCESS BY INDEX ROWID | EMPLOYEES         |      0 |      1 |      0 |00:00:00.01 |       0 |       |       |          |
|* 28 |       INDEX UNIQUE SCAN          | EMP_EMP_ID_PK     |      0 |      1 |      0 |00:00:00.01 |       0 |       |       |          |
|* 29 |     INDEX UNIQUE SCAN            | DEPT_ID_PK        |      0 |      1 |      0 |00:00:00.01 |       0 |       |       |          |
|* 30 |    TABLE ACCESS BY INDEX ROWID   | DEPARTMENTS       |      0 |      1 |      0 |00:00:00.01 |       0 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(( IS NOT NULL OR  IS NOT NULL OR  IS NOT NULL))
   3 - access(ROWID=ROWID)
   9 - access("E"."LAST_NAME"='King' AND "E"."FIRST_NAME"='Steven')
  10 - access("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID")
  11 - filter("D"."LOCATION_ID"=:B1)
  15 - filter("J"."JOB_TITLE"='President')
  17 - access("D"."LOCATION_ID"=:B1)
  18 - access("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID")
  19 - filter("J"."JOB_ID"="E"."JOB_ID")
  24 - filter("J2"."JOB_TITLE"='President')
  26 - access("J2"."JOB_ID"="H"."JOB_ID")
  28 - access("H"."EMPLOYEE_ID"="E"."EMPLOYEE_ID")
  29 - access("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID")
  30 - filter("D"."LOCATION_ID"=:B1)

Query 2: NoCOUG Example
This is the example in the original challenge article, translated into Ansi syntax. It nests the job history existence subquery within an outer existence subquery, and references the departments and employees tables only once.

QSD NoCOUG Example

NCOUG-Example

Query NoCOUG Example

SELECT l.location_id, l.city
  FROM locations l
 WHERE EXISTS
  (SELECT *
     FROM departments d
     JOIN employees e
       ON e.department_id = d.department_id
     JOIN jobs j
       ON j.job_id = e.job_id
    WHERE d.location_id = l.location_id
      AND ( 
            (e.first_name = 'Steven' AND e.last_name = 'King')
         OR j.job_title = 'President'
         OR EXISTS
            (SELECT *
               FROM job_history h
               JOIN jobs j2
                 ON j2.job_id = h.job_id
              WHERE h.employee_id = e.employee_id
                AND j2.job_title   = 'President'
            ) 
          )
  )

XPlan NoCOUG Example

-------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name              | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |                   |      1 |        |      1 |00:00:00.01 |     152 |       |       |          |
|*  1 |  HASH JOIN SEMI                  |                   |      1 |      7 |      1 |00:00:00.01 |     152 |  1156K|  1156K| 1120K (0)|
|   2 |   VIEW                           | index$_join$_001  |      1 |     23 |     23 |00:00:00.01 |       6 |       |       |          |
|*  3 |    HASH JOIN                     |                   |      1 |        |     23 |00:00:00.01 |       6 |  1023K|  1023K| 1443K (0)|
|   4 |     INDEX FAST FULL SCAN         | LOC_CITY_IX       |      1 |     23 |     23 |00:00:00.01 |       3 |       |       |          |
|   5 |     INDEX FAST FULL SCAN         | LOC_ID_PK         |      1 |     23 |     23 |00:00:00.01 |       3 |       |       |          |
|   6 |   VIEW                           | VW_SQ_1           |      1 |     11 |      1 |00:00:00.01 |     146 |       |       |          |
|*  7 |    FILTER                        |                   |      1 |        |      1 |00:00:00.01 |     146 |       |       |          |
|*  8 |     HASH JOIN                    |                   |      1 |    106 |    106 |00:00:00.01 |      15 |   876K|   876K|  895K (0)|
|   9 |      MERGE JOIN                  |                   |      1 |    107 |    107 |00:00:00.01 |       8 |       |       |          |
|  10 |       TABLE ACCESS BY INDEX ROWID| JOBS              |      1 |     19 |     19 |00:00:00.01 |       2 |       |       |          |
|  11 |        INDEX FULL SCAN           | JOB_ID_PK         |      1 |     19 |     19 |00:00:00.01 |       1 |       |       |          |
|* 12 |       SORT JOIN                  |                   |     19 |    107 |    107 |00:00:00.01 |       6 | 15360 | 15360 |14336  (0)|
|  13 |        TABLE ACCESS FULL         | EMPLOYEES         |      1 |    107 |    107 |00:00:00.01 |       6 |       |       |          |
|  14 |      TABLE ACCESS FULL           | DEPARTMENTS       |      1 |     27 |     27 |00:00:00.01 |       7 |       |       |          |
|  15 |     NESTED LOOPS                 |                   |    105 |        |      0 |00:00:00.01 |     131 |       |       |          |
|  16 |      NESTED LOOPS                |                   |    105 |      1 |     10 |00:00:00.01 |     121 |       |       |          |
|  17 |       TABLE ACCESS BY INDEX ROWID| JOB_HISTORY       |    105 |      1 |     10 |00:00:00.01 |     112 |       |       |          |
|* 18 |        INDEX RANGE SCAN          | JHIST_EMPLOYEE_IX |    105 |      1 |     10 |00:00:00.01 |     105 |       |       |          |
|* 19 |       INDEX UNIQUE SCAN          | JOB_ID_PK         |     10 |      1 |     10 |00:00:00.01 |       9 |       |       |          |
|* 20 |      TABLE ACCESS BY INDEX ROWID | JOBS              |     10 |      1 |      0 |00:00:00.01 |      10 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ITEM_1"="L"."LOCATION_ID")
   3 - access(ROWID=ROWID)
   7 - filter((("E"."FIRST_NAME"='Steven' AND "E"."LAST_NAME"='King') OR "J"."JOB_TITLE"='President' OR  IS NOT NULL))
   8 - access("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID")
  12 - access("J"."JOB_ID"="E"."JOB_ID")
       filter("J"."JOB_ID"="E"."JOB_ID")
  18 - access("H"."EMPLOYEE_ID"=:B1)
  19 - access("J2"."JOB_ID"="H"."JOB_ID")
  20 - filter("J2"."JOB_TITLE"='President')

Query 3: Subquery Factor Union
This converts the 'OR' conditions into a union of three driving subqueries that return the matching department ids from a subquery factor (which could equally be an inline view), and then joins departments and locations.

QSD Subquery Factor Union

NCOUG-SQF

Query Subquery Factor Union

WITH driving_union AS (
SELECT e.department_id
  FROM employees e
 WHERE (e.first_name = 'Steven' AND e.last_name = 'King')
 UNION
SELECT e.department_id
  FROM jobs j
  JOIN employees e
    ON e.job_id        = j.job_id
 WHERE j.job_title     = 'President'
 UNION
SELECT e.department_id
  FROM jobs j
  JOIN job_history h
    ON j.job_id        = h.job_id
  JOIN employees e
    ON e.employee_id    = h.employee_id
 WHERE j.job_title     = 'President'
)
SELECT DISTINCT l.location_id, l.city
  FROM driving_union u
  JOIN departments d
    ON d.department_id 	= u.department_id
  JOIN locations l
    ON l.location_id 	= d.location_id

XPlan Subquery Factor Union

----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name             | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                  |      1 |        |      1 |00:00:00.01 |      29 |       |       |          |
|   1 |  HASH UNIQUE                         |                  |      1 |      8 |      1 |00:00:00.01 |      29 |  1156K|  1156K|  464K (0)|
|*  2 |   HASH JOIN                          |                  |      1 |      8 |      1 |00:00:00.01 |      29 |  1517K|  1517K|  366K (0)|
|*  3 |    HASH JOIN                         |                  |      1 |      8 |      1 |00:00:00.01 |      23 |  1517K|  1517K|  382K (0)|
|   4 |     VIEW                             |                  |      1 |      8 |      1 |00:00:00.01 |      17 |       |       |          |
|   5 |      SORT UNIQUE                     |                  |      1 |      8 |      1 |00:00:00.01 |      17 |  2048 |  2048 | 2048  (0)|
|   6 |       UNION-ALL                      |                  |      1 |        |      2 |00:00:00.01 |      17 |       |       |          |
|   7 |        TABLE ACCESS BY INDEX ROWID   | EMPLOYEES        |      1 |      1 |      1 |00:00:00.01 |       2 |       |       |          |
|*  8 |         INDEX RANGE SCAN             | EMP_NAME_IX      |      1 |      1 |      1 |00:00:00.01 |       1 |       |       |          |
|   9 |        NESTED LOOPS                  |                  |      1 |        |      1 |00:00:00.01 |       8 |       |       |          |
|  10 |         NESTED LOOPS                 |                  |      1 |      6 |      1 |00:00:00.01 |       7 |       |       |          |
|* 11 |          TABLE ACCESS FULL           | JOBS             |      1 |      1 |      1 |00:00:00.01 |       6 |       |       |          |
|* 12 |          INDEX RANGE SCAN            | EMP_JOB_IX       |      1 |      6 |      1 |00:00:00.01 |       1 |       |       |          |
|  13 |         TABLE ACCESS BY INDEX ROWID  | EMPLOYEES        |      1 |      6 |      1 |00:00:00.01 |       1 |       |       |          |
|  14 |        NESTED LOOPS                  |                  |      1 |        |      0 |00:00:00.01 |       7 |       |       |          |
|  15 |         NESTED LOOPS                 |                  |      1 |      1 |      0 |00:00:00.01 |       7 |       |       |          |
|  16 |          NESTED LOOPS                |                  |      1 |      1 |      0 |00:00:00.01 |       7 |       |       |          |
|* 17 |           TABLE ACCESS FULL          | JOBS             |      1 |      1 |      1 |00:00:00.01 |       6 |       |       |          |
|  18 |           TABLE ACCESS BY INDEX ROWID| JOB_HISTORY      |      1 |      1 |      0 |00:00:00.01 |       1 |       |       |          |
|* 19 |            INDEX RANGE SCAN          | JHIST_JOB_IX     |      1 |      1 |      0 |00:00:00.01 |       1 |       |       |          |
|* 20 |          INDEX UNIQUE SCAN           | EMP_EMP_ID_PK    |      0 |      1 |      0 |00:00:00.01 |       0 |       |       |          |
|  21 |         TABLE ACCESS BY INDEX ROWID  | EMPLOYEES        |      0 |      1 |      0 |00:00:00.01 |       0 |       |       |          |
|  22 |     VIEW                             | index$_join$_011 |      1 |     27 |     27 |00:00:00.01 |       6 |       |       |          |
|* 23 |      HASH JOIN                       |                  |      1 |        |     27 |00:00:00.01 |       6 |  1096K|  1096K| 1547K (0)|
|  24 |       INDEX FAST FULL SCAN           | DEPT_ID_PK       |      1 |     27 |     27 |00:00:00.01 |       3 |       |       |          |
|  25 |       INDEX FAST FULL SCAN           | DEPT_LOCATION_IX |      1 |     27 |     27 |00:00:00.01 |       3 |       |       |          |
|  26 |    VIEW                              | index$_join$_013 |      1 |     23 |     23 |00:00:00.01 |       6 |       |       |          |
|* 27 |     HASH JOIN                        |                  |      1 |        |     23 |00:00:00.01 |       6 |  1023K|  1023K| 1426K (0)|
|  28 |      INDEX FAST FULL SCAN            | LOC_CITY_IX      |      1 |     23 |     23 |00:00:00.01 |       3 |       |       |          |
|  29 |      INDEX FAST FULL SCAN            | LOC_ID_PK        |      1 |     23 |     23 |00:00:00.01 |       3 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("L"."LOCATION_ID"="D"."LOCATION_ID")
   3 - access("D"."DEPARTMENT_ID"="U"."DEPARTMENT_ID")
   8 - access("E"."LAST_NAME"='King' AND "E"."FIRST_NAME"='Steven')
  11 - filter("J"."JOB_TITLE"='President')
  12 - access("E"."JOB_ID"="J"."JOB_ID")
  17 - filter("J"."JOB_TITLE"='President')
  19 - access("J"."JOB_ID"="H"."JOB_ID")
  20 - access("E"."EMPLOYEE_ID"="H"."EMPLOYEE_ID")
  23 - access(ROWID=ROWID)
  27 - access(ROWID=ROWID)

Query 4: Outer Joins
This avoids existence subqueries using the idea that an outer join with a constraint that the joined record is not null, with a distinct qualifier to eliminate duplicates, can serve as a logical equivalent.

QSD Outer Joins

NCOUG-OJ

Query Outer Joins

SELECT DISTINCT l.location_id, l.city
  FROM employees e
  LEFT JOIN jobs j
    ON j.job_id        	= e.job_id
   AND j.job_title      = 'President'  
  LEFT JOIN job_history h
    ON h.employee_id    = e.employee_id
  LEFT JOIN jobs j2
    ON j2.job_id        = h.job_id
   AND j2.job_title     = 'President' 
  JOIN departments d
    ON d.department_id 	= e.department_id
  JOIN locations l
    ON l.location_id 	= d.location_id
 WHERE (e.first_name = 'Steven' AND e.last_name = 'King')
    OR j.job_id IS NOT NULL
    OR j2.job_id IS NOT NULL

XPlan Outer Joins

----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name              | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                   |      1 |        |      1 |00:00:00.01 |      36 |       |       |          |
|   1 |  HASH UNIQUE                  |                   |      1 |    106 |      1 |00:00:00.01 |      36 |  1156K|  1156K|  464K (0)|
|*  2 |   FILTER                      |                   |      1 |        |      1 |00:00:00.01 |      36 |       |       |          |
|*  3 |    HASH JOIN RIGHT OUTER      |                   |      1 |    106 |    109 |00:00:00.01 |      36 |  1269K|  1269K|  369K (0)|
|*  4 |     TABLE ACCESS FULL         | JOBS              |      1 |      1 |      1 |00:00:00.01 |       6 |       |       |          |
|*  5 |     HASH JOIN RIGHT OUTER     |                   |      1 |    106 |    109 |00:00:00.01 |      30 |  1134K|  1134K|  751K (0)|
|   6 |      VIEW                     | index$_join$_004  |      1 |     10 |     10 |00:00:00.01 |       6 |       |       |          |
|*  7 |       HASH JOIN               |                   |      1 |        |     10 |00:00:00.01 |       6 |  1096K|  1096K| 1331K (0)|
|   8 |        INDEX FAST FULL SCAN   | JHIST_EMPLOYEE_IX |      1 |     10 |     10 |00:00:00.01 |       3 |       |       |          |
|   9 |        INDEX FAST FULL SCAN   | JHIST_JOB_IX      |      1 |     10 |     10 |00:00:00.01 |       3 |       |       |          |
|* 10 |      HASH JOIN OUTER          |                   |      1 |    106 |    106 |00:00:00.01 |      24 |   858K|   858K| 1270K (0)|
|* 11 |       HASH JOIN               |                   |      1 |    106 |    106 |00:00:00.01 |      18 |  1063K|  1063K| 1252K (0)|
|* 12 |        HASH JOIN              |                   |      1 |     27 |     27 |00:00:00.01 |      12 |  1156K|  1156K| 1133K (0)|
|  13 |         VIEW                  | index$_join$_010  |      1 |     23 |     23 |00:00:00.01 |       6 |       |       |          |
|* 14 |          HASH JOIN            |                   |      1 |        |     23 |00:00:00.01 |       6 |  1023K|  1023K| 1450K (0)|
|  15 |           INDEX FAST FULL SCAN| LOC_CITY_IX       |      1 |     23 |     23 |00:00:00.01 |       3 |       |       |          |
|  16 |           INDEX FAST FULL SCAN| LOC_ID_PK         |      1 |     23 |     23 |00:00:00.01 |       3 |       |       |          |
|  17 |         VIEW                  | index$_join$_008  |      1 |     27 |     27 |00:00:00.01 |       6 |       |       |          |
|* 18 |          HASH JOIN            |                   |      1 |        |     27 |00:00:00.01 |       6 |  1096K|  1096K| 1547K (0)|
|  19 |           INDEX FAST FULL SCAN| DEPT_ID_PK        |      1 |     27 |     27 |00:00:00.01 |       3 |       |       |          |
|  20 |           INDEX FAST FULL SCAN| DEPT_LOCATION_IX  |      1 |     27 |     27 |00:00:00.01 |       3 |       |       |          |
|  21 |        TABLE ACCESS FULL      | EMPLOYEES         |      1 |    107 |    107 |00:00:00.01 |       6 |       |       |          |
|* 22 |       TABLE ACCESS FULL       | JOBS              |      1 |      1 |      1 |00:00:00.01 |       6 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter((("E"."FIRST_NAME"='Steven' AND "E"."LAST_NAME"='King') OR "J"."JOB_ID" IS NOT NULL OR "J2"."JOB_ID" IS NOT NULL))
   3 - access("J2"."JOB_ID"="H"."JOB_ID")
   4 - filter("J2"."JOB_TITLE"='President')
   5 - access("H"."EMPLOYEE_ID"="E"."EMPLOYEE_ID")
   7 - access(ROWID=ROWID)
  10 - access("J"."JOB_ID"="E"."JOB_ID")
  11 - access("D"."DEPARTMENT_ID"="E"."DEPARTMENT_ID")
  12 - access("L"."LOCATION_ID"="D"."LOCATION_ID")
  14 - access(ROWID=ROWID)
  18 - access(ROWID=ROWID)
  22 - filter("J"."JOB_TITLE"='President')

Query Summary Table v11.2

Here is a summary of some statistics on the queries, run on an Oracle 11.2 XE instance. Query lines depends on formatting of course.

Query Buffers Table Instances XPlan Steps Query Lines
Literal 426 10 30 30
NoCOUG Example 152 6 20 23
Subquery Factor Union 29 6 29 25
Outer Joins 36 6 22 17

Oracle 12c

While the original version of this article, posted 25 August 2014, was based on Oracle 11.2, I later ran my script on Oracle 12.1, and noted that the 'literal' query now had a much-changed execution plan. In particular, the departments table had been 'factorised' out, appearing only once, and giving a much reduced buffer count of 31. It seems that this comes from an improvement in the transformation phase of query execution. The other queries did not show so much difference in plans, see the summary table below.

Execution Plan for Literal v12.1

-----------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name               | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |                    |      1 |        |      1 |00:00:00.05 |      31 |      1 |       |       |          |
|*  1 |  HASH JOIN SEMI                              |                    |      1 |      7 |      1 |00:00:00.05 |      31 |      1 |  1645K|  1645K| 1432K (0)|
|   2 |   VIEW                                       | index$_join$_001   |      1 |     23 |     23 |00:00:00.01 |       8 |      0 |       |       |          |
|*  3 |    HASH JOIN                                 |                    |      1 |        |     23 |00:00:00.01 |       8 |      0 |  1368K|  1368K| 1566K (0)|
|   4 |     INDEX FAST FULL SCAN                     | LOC_CITY_IX        |      1 |     23 |     23 |00:00:00.01 |       4 |      0 |       |       |          |
|   5 |     INDEX FAST FULL SCAN                     | LOC_ID_PK          |      1 |     23 |     23 |00:00:00.01 |       4 |      0 |       |       |          |
|   6 |   VIEW                                       | VW_SQ_1            |      1 |      8 |      1 |00:00:00.05 |      23 |      1 |       |       |          |
|   7 |    MERGE JOIN SEMI                           |                    |      1 |      8 |      1 |00:00:00.05 |      23 |      1 |       |       |          |
|   8 |     TABLE ACCESS BY INDEX ROWID              | DEPARTMENTS        |      1 |     27 |     10 |00:00:00.01 |       4 |      0 |       |       |          |
|   9 |      INDEX FULL SCAN                         | DEPT_ID_PK         |      1 |     27 |     10 |00:00:00.01 |       2 |      0 |       |       |          |
|* 10 |     SORT UNIQUE                              |                    |     10 |      8 |      1 |00:00:00.05 |      19 |      1 |  2048 |  2048 | 2048  (0)|
|  11 |      VIEW                                    | VW_JF_SET$1236063A |      1 |      8 |      2 |00:00:00.05 |      19 |      1 |       |       |          |
|  12 |       UNION-ALL                              |                    |      1 |        |      2 |00:00:00.05 |      19 |      1 |       |       |          |
|  13 |        NESTED LOOPS                          |                    |      1 |        |      1 |00:00:00.05 |       9 |      1 |       |       |          |
|  14 |         NESTED LOOPS                         |                    |      1 |      6 |      1 |00:00:00.05 |       8 |      1 |       |       |          |
|* 15 |          TABLE ACCESS FULL                   | JOBS               |      1 |      1 |      1 |00:00:00.01 |       7 |      0 |       |       |          |
|* 16 |          INDEX RANGE SCAN                    | EMP_JOB_IX         |      1 |      6 |      1 |00:00:00.05 |       1 |      1 |       |       |          |
|  17 |         TABLE ACCESS BY INDEX ROWID          | EMPLOYEES          |      1 |      6 |      1 |00:00:00.01 |       1 |      0 |       |       |          |
|  18 |        TABLE ACCESS BY INDEX ROWID BATCHED   | EMPLOYEES          |      1 |      1 |      1 |00:00:00.01 |       2 |      0 |       |       |          |
|* 19 |         INDEX RANGE SCAN                     | EMP_NAME_IX        |      1 |      1 |      1 |00:00:00.01 |       1 |      0 |       |       |          |
|  20 |        NESTED LOOPS                          |                    |      1 |        |      0 |00:00:00.01 |       8 |      0 |       |       |          |
|  21 |         NESTED LOOPS                         |                    |      1 |      1 |      0 |00:00:00.01 |       8 |      0 |       |       |          |
|  22 |          NESTED LOOPS                        |                    |      1 |      1 |      0 |00:00:00.01 |       8 |      0 |       |       |          |
|* 23 |           TABLE ACCESS FULL                  | JOBS               |      1 |      1 |      1 |00:00:00.01 |       7 |      0 |       |       |          |
|  24 |           TABLE ACCESS BY INDEX ROWID BATCHED| JOB_HISTORY        |      1 |      1 |      0 |00:00:00.01 |       1 |      0 |       |       |          |
|* 25 |            INDEX RANGE SCAN                  | JHIST_JOB_IX       |      1 |      1 |      0 |00:00:00.01 |       1 |      0 |       |       |          |
|* 26 |          INDEX UNIQUE SCAN                   | EMP_EMP_ID_PK      |      0 |      1 |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|  27 |         TABLE ACCESS BY INDEX ROWID          | EMPLOYEES          |      0 |      1 |      0 |00:00:00.01 |       0 |      0 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("VW_COL_1"="L"."LOCATION_ID")
   3 - access(ROWID=ROWID)
  10 - access("ITEM_1"="D"."DEPARTMENT_ID")
       filter("ITEM_1"="D"."DEPARTMENT_ID")
  15 - filter("J"."JOB_TITLE"='President')
  16 - access("J"."JOB_ID"="E"."JOB_ID")
  19 - access("E"."LAST_NAME"='King' AND "E"."FIRST_NAME"='Steven')
  23 - filter("J2"."JOB_TITLE"='President')
  25 - access("J2"."JOB_ID"="H"."JOB_ID")
  26 - access("H"."EMPLOYEE_ID"="E"."EMPLOYEE_ID")

Note
-----
   - this is an adaptive plan

Query Summary Table v12.1

Query Buffers Table Instances XPlan Steps Query Lines
Literal 31 10 27 31
NoCOUG Example 156 6 19 23
Subquery Factor Union 29 6 28 25
Outer Joins 45 6 22 17

Here is the v12.1 output:

NCOUG-3-121






SQL for Continuum and Contiguity Grouping

A question was asked some time ago on Oracle's SQL&PL/SQL forum (Solution design and performance - SQL or PL/SQL?), the gist of which the poster specified thus:

I have a set of records which keep track of a status at a given date for a range of points on a given road. Basically, I need the current bits to "shine through".

The thread, despite the open-minded title, gives a nice illustration of the odd preference that people often have for complicated PL/SQL solutions to problems amenable to simpler SQL. I provided an SQL solution in that thread for this problem, and in this article I have taken a simpler, more general form of the problem defined there, and provide SQL solutions with diagrams illustrating how they work.

The class of problem can be characterised as follows:

  • Records have range fields, and we want to aggregate separately at each point within the 1-dimensional range domains (or 'continuum'), independently by grouping key
  • Within the groups we want to aggregate only the first or last records, ordering by some non-key fields

The first point above can be seen as equivalent to adding in to the main grouping key an implicit field specifying the domain leg, i.e. the region between the break points defined by the record ranges, and splitting the records by leg. We might term this form of aggregation continuum aggregation, or perhaps vertical aggregation if we view the range as distance, the internal ordering as by time, and visualise it graphically as in my diagrams below. Once this vertical aggregation is established, it is natural to think of adding a second aggregation, that might be termed contiguity aggregation, or with the same visualisation, horizontal aggregation:

  • Contiguous legs having the same values for the aggregated attributes will be grouped together

Here is an extract on grouping from Oracle® Database SQL Language Reference:

Oracle applies the aggregate functions to each group of rows and returns a single result row for each group.
...
The aggregate functions MIN, MAX, SUM, AVG, COUNT, VARIANCE, and STDDEV, when followed by the KEEP keyword, can be used in conjunction with the FIRST or LAST function to operate on a set of values from a set of rows that rank as the FIRST or LAST with respect to a given sorting specification. Refer to FIRST for more information.

A simple use-case for the KEEP form of grouping is to list contacts with their most recent telephone number when multiple numbers are stored in a separate table. A few years ago, I realised that in most examples that I see developers write their own code instead of using the built-in constructs, and I wrote an article comparing the performance and complexity of the various alternatives for this requirement (as well as alternatives for SQL pivoting), SQL Pivot and Prune Queries - Keeping an Eye on Performance.

Update, 20 November 2013: Added a Postgres version of the SQL solution.

Test Problem

Data Model

We take for our example problem a very simple model consisting of roads and road events where the events occur over a stretch of road at a given time, and event type forms the horizontal grouping attribute.

Road - ERD2

Test Data

Input data - Roads

ROAD_ID ROAD_DESC       START_POINT  END_POINT
------- --------------- ----------- ----------
      1 The Strand                0        200
      2 Piccadilly                0        100

Input data - road_events

ROAD_ID     REV_ID E_TYPE START_POINT  END_POINT E_DATE
------- ---------- ------ ----------- ---------- ---------
      1          1 OPEN             0         10 01-JAN-07
                 2 OPEN            20         50 01-JAN-08
                 3 CLOSED         130        160 01-FEB-08
                 4 CLOSED          55         85 05-JUN-08
                 5 OTHER           45        115 01-JAN-09
                 6 OPEN            60        100 12-FEB-11
                 7 CLOSED         115        145 12-FEB-12
      2          8 CLOSED          10         30 01-JAN-10
                 9 OPEN            40         50 01-JAN-11
                10 OPEN            50         70 01-JAN-12

10 rows selected.

The following diagrams display the data and solutions for our test problem.

Road - Road 1

Road - Road 2

SQL Solution for 'Point' Problem

To start with, here is a standard SQL solution for the 'zero-dimensional' problem, where the most recent record is required for each road:

Zero-Dimensional Solution

ROAD_DESC          R_START      R_END    E_START      E_END E_DATE    E_TYPE
--------------- ---------- ---------- ---------- ---------- --------- ------
Piccadilly               0        100         50         70 01-JAN-12 OPEN
The Strand               0        200        115        145 12-FEB-12 CLOSED

  1  SELECT r.road_desc, r.start_point r_start, r.end_point r_end,
  2         Max (e.start_point) KEEP (DENSE_RANK LAST ORDER BY e.event_date) e_start,
  3         Max (e.end_point) KEEP (DENSE_RANK LAST ORDER BY e.event_date) e_end,
  4         Max (e.event_date) KEEP (DENSE_RANK LAST ORDER BY e.event_date) e_date,
  5         Max (e.event_type) KEEP (DENSE_RANK LAST ORDER BY e.event_date) e_type
  6    FROM road_events e
  7    JOIN roads r
  8      ON r.id = e.road_id
  9   GROUP BY r.road_desc, r.start_point, r.end_point
 10*  ORDER BY 1, 2, 3

The SQL is more complicated for the continuum problem, and we provide two versions, that differ in the final stage, of horizontal grouping by contiguous event type. The first uses a differencing method popular in Oracle 11g and earlier versions for this common type of grouping problem; the second uses a new feature from Oracle 12c, row pattern matching. [I have also used another technique for this type of grouping, in an article on contiguity and other range-based grouping problems in June 2011, Forming Range-Based Break Groups With Advanced SQL.]

SQL Solution for 'Continuum' Problem, Contiguity Grouping by Differences

SQL (Contiguity by Differences)

WITH breaks AS  (
        SELECT road_id, start_point bp FROM road_events
         UNION
        SELECT road_id, end_point FROM road_events
         UNION
        SELECT id, start_point FROM roads
         UNION
        SELECT id, end_point FROM roads
), legs AS (
        SELECT road_id, bp leg_start, Lead (bp) OVER (PARTITION BY road_id ORDER BY bp) leg_end
          FROM breaks
), latest_events AS ( 
        SELECT l.road_id, l.leg_start, l.leg_end,
               Max (e.id) KEEP (DENSE_RANK LAST ORDER BY e.event_date) event_id,
               Nvl (Max (e.event_type) KEEP (DENSE_RANK LAST ORDER BY e.event_date), '(none)') event_type
          FROM legs l
          LEFT JOIN road_events e
            ON e.road_id = l.road_id
           AND e.start_point <= l.leg_start
	   AND e.end_point >= l.leg_end
         WHERE l.leg_end IS NOT NULL
         GROUP BY l.road_id, l.leg_start, l.leg_end
), latest_events_group AS ( 
        SELECT road_id,
               leg_start,
               leg_end,
               event_id,
               event_type,
               Dense_Rank () OVER (PARTITION BY road_id ORDER BY leg_start, leg_end) -
               Dense_Rank () OVER (PARTITION BY road_id, event_type ORDER BY leg_start, leg_end) group_no
          FROM latest_events
)
SELECT l.road_id, r.road_desc,
       Min (l.leg_start)        sec_start,
       Max (l.leg_end)          sec_end,
       l.event_type             e_type,
       l.group_no
  FROM latest_events_group l
  JOIN roads r
    ON r.id = l.road_id
 GROUP BY l.road_id,
        r.road_desc, 
        l.event_type,
        l.group_no
ORDER BY 1, 2, 3
/

ROAD_ID ROAD_DESC        SEC_START    SEC_END E_TYPE   GROUP_NO
------- --------------- ---------- ---------- ------ ----------
      1 The Strand               0         10 OPEN            0
                                10         20 (none)          1
                                20         45 OPEN            1
                                45         60 OTHER           3
                                60        100 OPEN            4
                               100        115 OTHER           5
                               115        160 CLOSED          9
                               160        200 (none)         11
      2 Piccadilly               0         10 (none)          0
                                10         30 CLOSED          1
                                30         40 (none)          1
                                40         70 OPEN            3
                                70        100 (none)          3

13 rows selected.

Query Structure Diagram (Contiguity by Differences)Road, V1.5 - QSD-DiffExecution Plan (Contiguity by Differences)

--------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name          | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |               |      1 |        |     13 |00:00:00.01 |      25 |       |       |          |
|   1 |  SORT ORDER BY                  |               |      1 |      2 |     13 |00:00:00.01 |      25 |  2048 |  2048 | 2048  (0)|
|   2 |   HASH GROUP BY                 |               |      1 |      2 |     13 |00:00:00.01 |      25 |   900K|   900K| 1343K (0)|
|   3 |    MERGE JOIN                   |               |      1 |     24 |     19 |00:00:00.01 |      25 |       |       |          |
|   4 |     TABLE ACCESS BY INDEX ROWID | ROADS         |      1 |      2 |      2 |00:00:00.01 |       2 |       |       |          |
|   5 |      INDEX FULL SCAN            | ROAD_PK       |      1 |      2 |      2 |00:00:00.01 |       1 |       |       |          |
|*  6 |     SORT JOIN                   |               |      2 |     24 |     19 |00:00:00.01 |      23 |  2048 |  2048 | 2048  (0)|
|   7 |      VIEW                       |               |      1 |     24 |     19 |00:00:00.01 |      23 |       |       |          |
|   8 |       WINDOW SORT               |               |      1 |     24 |     19 |00:00:00.01 |      23 |  2048 |  2048 | 2048  (0)|
|   9 |        WINDOW NOSORT            |               |      1 |     24 |     19 |00:00:00.01 |      23 | 73728 | 73728 |          |
|  10 |         SORT GROUP BY           |               |      1 |     24 |     19 |00:00:00.01 |      23 |  4096 |  4096 | 4096  (0)|
|* 11 |          HASH JOIN OUTER        |               |      1 |     24 |     25 |00:00:00.01 |      23 |  1696K|  1696K|  540K (0)|
|* 12 |           VIEW                  |               |      1 |     24 |     19 |00:00:00.01 |      16 |       |       |          |
|  13 |            WINDOW SORT          |               |      1 |     24 |     21 |00:00:00.01 |      16 |  2048 |  2048 | 2048  (0)|
|  14 |             VIEW                |               |      1 |     24 |     21 |00:00:00.01 |      16 |       |       |          |
|  15 |              SORT UNIQUE        |               |      1 |     24 |     21 |00:00:00.01 |      16 |  2048 |  2048 | 2048  (0)|
|  16 |               UNION-ALL         |               |      1 |        |     24 |00:00:00.01 |      16 |       |       |          |
|  17 |                INDEX FULL SCAN  | ROAD_EVENT_N1 |      1 |     10 |     10 |00:00:00.01 |       1 |       |       |          |
|  18 |                INDEX FULL SCAN  | ROAD_EVENT_N3 |      1 |     10 |     10 |00:00:00.01 |       1 |       |       |          |
|  19 |                TABLE ACCESS FULL| ROADS         |      1 |      2 |      2 |00:00:00.01 |       7 |       |       |          |
|  20 |                TABLE ACCESS FULL| ROADS         |      1 |      2 |      2 |00:00:00.01 |       7 |       |       |          |
|  21 |           TABLE ACCESS FULL     | ROAD_EVENTS   |      1 |     10 |     10 |00:00:00.01 |       7 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   6 - access("R"."ID"="L"."ROAD_ID")
       filter("R"."ID"="L"."ROAD_ID")
  11 - access("E"."ROAD_ID"="L"."ROAD_ID")
       filter(("E"."START_POINT"<="L"."LEG_START" AND "E"."END_POINT">="L"."LEG_END"))
  12 - filter("L"."LEG_END" IS NOT NULL)

SQL Solution for 'Continuum' Problem, Contiguity Grouping by 12c Row Pattern Matching

SQL (Contiguity by Pattern Matching)

WITH breaks AS  (
        SELECT road_id, start_point bp FROM road_events
         UNION
        SELECT road_id, end_point FROM road_events
         UNION
        SELECT id, start_point FROM roads
         UNION
        SELECT id, end_point FROM roads
), legs AS (
        SELECT road_id, bp leg_start, Lead (bp) OVER (PARTITION BY road_id ORDER BY bp) leg_end
          FROM breaks
), latest_events AS ( 
        SELECT l.road_id, r.road_desc, l.leg_start, l.leg_end,
               Max (e.id) KEEP (DENSE_RANK LAST ORDER BY e.event_date) event_id,
               Nvl (Max (e.event_type) KEEP (DENSE_RANK LAST ORDER BY e.event_date), '(none)') event_type
          FROM legs l
          JOIN roads r
            ON r.id = l.road_id
          LEFT JOIN road_events e
            ON e.road_id = l.road_id
           AND e.start_point <= l.leg_start
	   AND e.end_point >= l.leg_end
         WHERE l.leg_end IS NOT NULL
         GROUP BY l.road_id, r.road_desc, l.leg_start, l.leg_end
)
SELECT m.road_id, m.road_desc, m.sec_start, m.sec_end, m.event_type e_type
  FROM latest_events
 MATCH_RECOGNIZE (
   PARTITION BY road_id, road_desc
   ORDER BY leg_start, leg_end
   MEASURES FIRST (leg_start) sec_start,
            LAST (leg_end) sec_end,
            LAST (event_type) event_type
   PATTERN (strt sm*)
   DEFINE sm AS PREV(sm.event_type) = sm.event_type
 ) m
ORDER BY 1, 2, 3
/

ROAD_ID ROAD_DESC        SEC_START    SEC_END E_TYPE
------- --------------- ---------- ---------- ------
      1 The Strand               0         10 OPEN
                                10         20 (none)
                                20         45 OPEN
                                45         60 OTHER
                                60        100 OPEN
                               100        115 OTHER
                               115        160 CLOSED
                               160        200 (none)
      2 Piccadilly               0         10 (none)
                                10         30 CLOSED
                                30         40 (none)
                                40         70 OPEN
                                70        100 (none)

13 rows selected.

Query Structure Diagram (Contiguity by Pattern Matching)Road, V1.5 - QSD-MRExecution Plan (Contiguity by Pattern Matching)

-------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                        | Name          | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                                 |               |      1 |        |     13 |00:00:00.01 |      25 |       |       |          |
|   1 |  SORT ORDER BY                                   |               |      1 |      2 |     13 |00:00:00.01 |      25 |  2048 |  2048 | 2048  (0)|
|   2 |   VIEW                                           |               |      1 |      2 |     13 |00:00:00.01 |      25 |       |       |          |
|   3 |    MATCH RECOGNIZE SORT DETERMINISTIC FINITE AUTO|               |      1 |      2 |     13 |00:00:00.01 |      25 |  2048 |  2048 | 2048  (0)|
|   4 |     VIEW                                         |               |      1 |      2 |     19 |00:00:00.01 |      25 |       |       |          |
|   5 |      SORT GROUP BY                               |               |      1 |      2 |     19 |00:00:00.01 |      25 |  4096 |  4096 | 4096  (0)|
|*  6 |       HASH JOIN OUTER                            |               |      1 |     24 |     25 |00:00:00.01 |      25 |   987K|   987K|  525K (0)|
|   7 |        MERGE JOIN                                |               |      1 |     24 |     19 |00:00:00.01 |      18 |       |       |          |
|   8 |         TABLE ACCESS BY INDEX ROWID              | ROADS         |      1 |      2 |      2 |00:00:00.01 |       2 |       |       |          |
|   9 |          INDEX FULL SCAN                         | ROAD_PK       |      1 |      2 |      2 |00:00:00.01 |       1 |       |       |          |
|* 10 |         SORT JOIN                                |               |      2 |     24 |     19 |00:00:00.01 |      16 |  2048 |  2048 | 2048  (0)|
|* 11 |          VIEW                                    |               |      1 |     24 |     19 |00:00:00.01 |      16 |       |       |          |
|  12 |           WINDOW SORT                            |               |      1 |     24 |     21 |00:00:00.01 |      16 |  2048 |  2048 | 2048  (0)|
|  13 |            VIEW                                  |               |      1 |     24 |     21 |00:00:00.01 |      16 |       |       |          |
|  14 |             SORT UNIQUE                          |               |      1 |     24 |     21 |00:00:00.01 |      16 |  2048 |  2048 | 2048  (0)|
|  15 |              UNION-ALL                           |               |      1 |        |     24 |00:00:00.01 |      16 |       |       |          |
|  16 |               INDEX FULL SCAN                    | ROAD_EVENT_N1 |      1 |     10 |     10 |00:00:00.01 |       1 |       |       |          |
|  17 |               INDEX FULL SCAN                    | ROAD_EVENT_N3 |      1 |     10 |     10 |00:00:00.01 |       1 |       |       |          |
|  18 |               TABLE ACCESS FULL                  | ROADS         |      1 |      2 |      2 |00:00:00.01 |       7 |       |       |          |
|  19 |               TABLE ACCESS FULL                  | ROADS         |      1 |      2 |      2 |00:00:00.01 |       7 |       |       |          |
|  20 |        TABLE ACCESS FULL                         | ROAD_EVENTS   |      1 |     10 |     10 |00:00:00.01 |       7 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   6 - access("E"."ROAD_ID"="L"."ROAD_ID")
       filter(("E"."START_POINT"<="L"."LEG_START" AND "E"."END_POINT">="L"."LEG_END"))
  10 - access("R"."ID"="L"."ROAD_ID")
       filter("R"."ID"="L"."ROAD_ID")
  11 - filter("L"."LEG_END" IS NOT NULL)

Note
-----
   - this is an adaptive plan

SQL Solution for 'Continuum' Problem, Contiguity Grouping - Postgres

The Oracle v11 (and earlier versions) solution, with contiguity by differences, can be converted to work in Postgres, as shown below.

SQL (Continuum by Row_Number, Contiguity by Differences - Postgres)

WITH breaks AS  (
        SELECT road_id, start_point bp FROM road_events
         UNION
        SELECT road_id, end_point FROM road_events
         UNION
        SELECT id, start_point FROM roads
         UNION
        SELECT id, end_point FROM roads
), legs AS (
        SELECT road_id, bp leg_start, Lead (bp) OVER (PARTITION BY road_id ORDER BY bp) leg_end
          FROM breaks
), ranked_events AS ( 
        SELECT l.road_id, l.leg_start, l.leg_end,
               e.id event_id, Coalesce (e.event_type, '(none)') event_type,
               Row_Number() OVER (PARTITION BY l.road_id, l.leg_start ORDER BY e.event_date DESC) rnk
          FROM legs l
          LEFT JOIN road_events e
            ON e.road_id = l.road_id
           AND e.start_point <= l.leg_start            AND e.end_point >= l.leg_end
         WHERE l.leg_end IS NOT NULL
), latest_events_group AS ( 
        SELECT road_id,
               leg_start,
               leg_end,
               event_id,
               event_type,
               Dense_Rank () OVER (PARTITION BY road_id ORDER BY leg_start, leg_end) -
               Dense_Rank () OVER (PARTITION BY road_id, event_type ORDER BY leg_start, leg_end) group_no
          FROM ranked_events
         WHERE rnk = 1
)
SELECT l.road_id, r.road_desc,
       Min (l.leg_start)        sec_start,
       Max (l.leg_end)          sec_end,
       l.event_type             e_type,
       l.group_no
  FROM latest_events_group l
  JOIN roads r
    ON r.id = l.road_id
 GROUP BY l.road_id,
        r.road_desc, 
        l.event_type,
        l.group_no
ORDER BY 1, 2, 3;

Continuum/contiguity Solution with Row_Number...
 road_id | road_desc  | sec_start | sec_end | e_type | group_no 
---------+------------+-----------+---------+--------+----------
       1 | The Strand |         0 |      10 | OPEN   |        0
       1 | The Strand |        10 |      20 | (none) |        1
       1 | The Strand |        20 |      45 | OPEN   |        1
       1 | The Strand |        45 |      60 | OTHER  |        3
       1 | The Strand |        60 |     100 | OPEN   |        4
       1 | The Strand |       100 |     115 | OTHER  |        5
       1 | The Strand |       115 |     160 | CLOSED |        9
       1 | The Strand |       160 |     200 | (none) |       11
       2 | Piccadilly |         0 |      10 | (none) |        0
       2 | Piccadilly |        10 |      30 | CLOSED |        1
       2 | Piccadilly |        30 |      40 | (none) |        1
       2 | Piccadilly |        40 |      70 | OPEN   |        3
       2 | Piccadilly |        70 |     100 | (none) |        3
(13 rows)

SELECT Version();

Postgres version...
                           version                           
-------------------------------------------------------------
 PostgreSQL 9.3.1, compiled by Visual C++ build 1600, 64-bit
(1 row)

Notes on Conversion of SQL to Postgres

  • Postgres does not have an exact equivalent of Oracle's KEEP/FIRST grouping functionality, but it can be emulated via the analytic function Row_Number within a subquery
  • Coalesce, which is also available in Oracle, replaces Nvl, which does not exist in Postgres
  • Oracle's execution plans were obtained using DBMS_XPlan after running the queries, while the Postgres version was obtained by running the query prefaced by 'EXPLAIN ANALYZE '
  • There is no Postgres equivalent of Oracle v12's row pattern matching

Query Structure Diagram (Continuum by Row_Number, Contiguity by Differences - Postgres) Road, V1.5 - QSD-PGExecution Plan (Continuum by Row_Number, Contiguity by Differences - Postgres)

Prefacing the query with 'EXPLAIN ANALYZE ' gives:

Explaining Continuum/contiguity Solution with Row_Number...
                                                                 QUERY PLAN                                                                  
---------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=619.45..619.47 rows=8 width=270) (actual time=0.235..0.235 rows=13 loops=1)
   Sort Key: l.road_id, r.road_desc, (min(l.leg_start))
   Sort Method: quicksort  Memory: 26kB
   CTE breaks
     ->  HashAggregate  (cost=79.50..95.30 rows=1580 width=8) (actual time=0.013..0.020 rows=21 loops=1)
           ->  Append  (cost=0.00..71.60 rows=1580 width=8) (actual time=0.002..0.006 rows=24 loops=1)
                 ->  Seq Scan on road_events  (cost=0.00..14.80 rows=480 width=8) (actual time=0.001..0.003 rows=10 loops=1)
                 ->  Seq Scan on road_events road_events_1  (cost=0.00..14.80 rows=480 width=8) (actual time=0.000..0.002 rows=10 loops=1)
                 ->  Seq Scan on roads  (cost=0.00..13.10 rows=310 width=8) (actual time=0.000..0.001 rows=2 loops=1)
                 ->  Seq Scan on roads roads_1  (cost=0.00..13.10 rows=310 width=8) (actual time=0.000..0.000 rows=2 loops=1)
   CTE legs
     ->  WindowAgg  (cost=115.54..147.14 rows=1580 width=8) (actual time=0.030..0.041 rows=21 loops=1)
           ->  Sort  (cost=115.54..119.49 rows=1580 width=8) (actual time=0.028..0.029 rows=21 loops=1)
                 Sort Key: breaks.road_id, breaks.bp
                 Sort Method: quicksort  Memory: 25kB
                 ->  CTE Scan on breaks  (cost=0.00..31.60 rows=1580 width=8) (actual time=0.014..0.023 rows=21 loops=1)
   CTE ranked_events
     ->  WindowAgg  (cost=290.71..326.08 rows=1572 width=138) (actual time=0.089..0.104 rows=25 loops=1)
           ->  Sort  (cost=290.71..294.64 rows=1572 width=138) (actual time=0.088..0.089 rows=25 loops=1)
                 Sort Key: l_1.road_id, l_1.leg_start, e.event_date
                 Sort Method: quicksort  Memory: 26kB
                 ->  Hash Left Join  (cost=20.80..207.25 rows=1572 width=138) (actual time=0.044..0.079 rows=25 loops=1)
                       Hash Cond: (l_1.road_id = e.road_id)
                       Join Filter: ((e.start_point <= l_1.leg_start) AND (e.end_point >= l_1.leg_end))
                       Rows Removed by Join Filter: 89
                       ->  CTE Scan on legs l_1  (cost=0.00..31.60 rows=1572 width=12) (actual time=0.031..0.048 rows=19 loops=1)
                             Filter: (leg_end IS NOT NULL)
                             Rows Removed by Filter: 2
                       ->  Hash  (cost=14.80..14.80 rows=480 width=138) (actual time=0.005..0.005 rows=10 loops=1)
                             Buckets: 1024  Batches: 1  Memory Usage: 1kB
                             ->  Seq Scan on road_events e  (cost=0.00..14.80 rows=480 width=138) (actual time=0.001..0.002 rows=10 loops=1)
   CTE latest_events_group
     ->  WindowAgg  (cost=35.79..36.01 rows=8 width=48) (actual time=0.165..0.181 rows=19 loops=1)
           ->  Sort  (cost=35.79..35.81 rows=8 width=48) (actual time=0.164..0.165 rows=19 loops=1)
                 Sort Key: ranked_events.road_id, ranked_events.event_type, ranked_events.leg_start, ranked_events.leg_end
                 Sort Method: quicksort  Memory: 26kB
                 ->  WindowAgg  (cost=35.49..35.67 rows=8 width=48) (actual time=0.124..0.138 rows=19 loops=1)
                       ->  Sort  (cost=35.49..35.51 rows=8 width=48) (actual time=0.123..0.124 rows=19 loops=1)
                             Sort Key: ranked_events.road_id, ranked_events.leg_start, ranked_events.leg_end
                             Sort Method: quicksort  Memory: 26kB
                             ->  CTE Scan on ranked_events  (cost=0.00..35.37 rows=8 width=48) (actual time=0.090..0.117 rows=19 loops=1)
                                   Filter: (rnk = 1)
                                   Rows Removed by Filter: 6
   ->  HashAggregate  (cost=14.72..14.80 rows=8 width=270) (actual time=0.218..0.221 rows=13 loops=1)
         ->  Hash Join  (cost=0.26..14.60 rows=8 width=270) (actual time=0.203..0.207 rows=19 loops=1)
               Hash Cond: (r.id = l.road_id)
               ->  Seq Scan on roads r  (cost=0.00..13.10 rows=310 width=222) (actual time=0.002..0.002 rows=2 loops=1)
               ->  Hash  (cost=0.16..0.16 rows=8 width=52) (actual time=0.192..0.192 rows=19 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 2kB
                     ->  CTE Scan on latest_events_group l  (cost=0.00..0.16 rows=8 width=52) (actual time=0.167..0.190 rows=19 loops=1)
 Total runtime: 0.432 ms
(51 rows)

Code

Here is the code: Road-SQL






Notes on Profiling Oracle PL/SQL

'Everything should be made as simple as possible, but not simpler'

This phrase is often attributed to Albert Einstein, although the attribution is apparently questionable:
Everything Should Be Made as Simple as Possible, But Not Simpler. In any case it's not a bad approach to follow, even if the quote did come from a non-Oracle guy :).

I recently started looking at the hierarchical profiler tool with a view to using it in an upcoming project. In order to understand the tool properly, I felt it would be a good idea to start by using it to profile a test program that would be as simple as possible while covering as wide a range of scenarios as possible. This article documents the results of that profiling, highlighting the different scenarios covered, discusses the output from the profiler, and includes a query I wrote to display the function call tree.

The article goes on to illustrate profiling through manual code instrumentation, and by the old flat profiler (DBMS_Profiler) on the same test program, concluding that each method has its own strengths and weaknesses.

Setup
The hierarchical profiler setup and use is described in Oracle® Database Advanced Application Developer's Guide 11g Release 2 (11.2), and some code snippets are available here:PL/SQL Hierarchical Profiler in Oracle Database 11g Release 1

Scenarios
The test program consists of a driving script, Test_Rep_p.sql (attached), that calls a package (HProf_Test) and an object type (Table_Count_Type), both defined in the attached script, HProf_Test_Code.sql. The test program covers the following scenarios:

  • Multiple root calls (__plsql_vm, A_CALLS_B)
  • Recursive procedure calls (procedure calling itself: R_CALLS_R)
  • Mutually recursive procedure calls (procedures call each other: A_CALLS_B and B_CALLS_A)
  • Procedure called by multiple procedures (child with multiple parents: PUT_LINE)
  • Procedure 'inlined' within PL/SQL (Rest_a_While)
  • Static SQL within PL/SQL (__static_sql_exec_line8)
  • Dynamic SQL within PL/SQL (__dyn_sql_exec_line12)
  • 'Everything should be made as simple as possible, but not simpler'

    This phrase is often attributed to Albert Einstein, although the attribution is apparently questionable:
    Everything Should Be Made as Simple as Possible, But Not Simpler. In any case it's not a bad approach to follow, even if the quote did come from a non-Oracle guy :).

    I recently started looking at the hierarchical profiler tool with a view to using it in an upcoming project. In order to understand the tool properly, I felt it would be a good idea to start by using it to profile a test program that would be as simple as possible while covering as wide a range of scenarios as possible. This article documents the results of that profiling, highlighting the different scenarios covered, discusses the output from the profiler, and includes a query I wrote to display the function call tree.

    The article goes on to illustrate profiling through manual code instrumentation, and by the old flat profiler (DBMS_Profiler) on the same test program, concluding that each method has its own strengths and weaknesses.

    Setup
    The hierarchical profiler setup and use is described in Oracle® Database Advanced Application Developer's Guide 11g Release 2 (11.2), and some code snippets are available here:PL/SQL Hierarchical Profiler in Oracle Database 11g Release 1

    Scenarios
    The test program consists of a driving script, Test_Rep_p.sql (attached), that calls a package (HProf_Test) and an object type (Table_Count_Type), both defined in the attached script, HProf_Test_Code.sql. The test program covers the following scenarios:

    • Multiple root calls (__plsql_vm, A_CALLS_B)
    • Recursive procedure calls (procedure calling itself: R_CALLS_R)
    • Mutually recursive procedure calls (procedures call each other: A_CALLS_B and B_CALLS_A)
    • Procedure called by multiple procedures (child with multiple parents: PUT_LINE)
    • Procedure 'inlined' within PL/SQL (Rest_a_While)
    • Static SQL within PL/SQL (__static_sql_exec_line8)
    • Dynamic SQL within PL/SQL (__dyn_sql_exec_line12)
    • Database function called from SQL in SQL*Plus (DBFUNC)
    • Database function called from SQL in PL/SQL (DBFUNC)
    • Object constructor call (TABLE_COUNT_TYPE)

    Call Structure Diagram
    HProf - CSD

    Raw Results
    The attached script Test_Rep_h.sql was used to report on the results. The record produced in the run table, DBMSHP_RUNS, was:

         RUNID RUN_TIMESTAMP                   MICRO_S    SECONDS RUN_COMMENT
    ---------- ---------------------------- ---------- ---------- ------------------------------------------------------------
            11 04-MAR-13 07.07.36.803000        890719        .89 Profile for small test program with recursion

    The records produced in the functions table, DBMSHP_FUNCTION_INFO, were:

    OWNER MODULE               FUNCTION                         ID  LINE#      SUB_T      FUN_T  CALLS
    ----- -------------------- ------------------------------ ---- ------ ---------- ---------- ------
    NET   HPROF_TEST           A_CALLS_B                         4     40      62340       4450      1
    NET   HPROF_TEST           A_CALLS_B@1                       5     40      43729      13663      1
    NET   HPROF_TEST           B_CALLS_A                         6     38      57890      14161      1
    NET   HPROF_TEST           B_CALLS_A@1                       7     38      30066      30066      1
    NET   HPROF_TEST           DBFUNC                            8     84      32629      32629      2
    NET   HPROF_TEST           R_CALLS_R                         9     70      12823       4159      1
    NET   HPROF_TEST           R_CALLS_R@1                      10     70       8633       8618      1
    NET   HPROF_TEST           STOP_PROFILING                   11     16         21         21      1
    NET   TABLE_COUNT_TYPE     TABLE_COUNT_TYPE                 12      3      55049         82      1
    NET   TABLE_COUNT_TYPE     __static_sql_exec_line6          22      6      54967      54967      1
    SYS   DBMS_HPROF           STOP_PROFILING                   13     59          0          0      1
    SYS   DBMS_OUTPUT          GET_LINE                         14    129          8          8      3
    SYS   DBMS_OUTPUT          GET_LINES                        15    160         68         60      3
    SYS   DBMS_OUTPUT          NEW_LINE                         16    117          7          7      2
    SYS   DBMS_OUTPUT          PUT                              17     77         28         28      2
    SYS   DBMS_OUTPUT          PUT_LINE                         18    109         46         11      2
                               __anonymous_block                 1      0     809839        521      5
                               __dyn_sql_exec_line12            19     12        226        226      1
                               __plsql_vm                        2      0     828379         58      6
                               __plsql_vm@1                      3      0      14158         11      1
                               __sql_fetch_line13               20     13     726713     726713      1
                               __static_sql_exec_line8          21      8      14418        260      1
    
    22 rows selected.

    The SUB_T and FUN_T values are the total times in microseconds for the subtree including function, and function-only processing.

    The records produced in the functions parent-child table, DBMSHP_PARENT_CHILD_INFO, were:

    OWNER_P MODULE_P             FUNCTION_P                     OWNER_C MODULE_C             FUNCTION_C                          SUB_T      FUN_T  CALLS
    ------- -------------------- ------------------------------ ------- -------------------- ------------------------------ ---------- ---------- ------
    NET     HPROF_TEST           STOP_PROFILING                 SYS     DBMS_HPROF           STOP_PROFILING                          0          0      1
    NET     HPROF_TEST           R_CALLS_R@1                    SYS     DBMS_OUTPUT          PUT_LINE                               15          6      1
    NET     HPROF_TEST           R_CALLS_R                      SYS     DBMS_OUTPUT          PUT_LINE                               31          5      1
    NET     HPROF_TEST           R_CALLS_R                      NET     HPROF_TEST           R_CALLS_R@1                          8633       8618      1
    NET     HPROF_TEST           B_CALLS_A                      NET     HPROF_TEST           A_CALLS_B@1                         43729      13663      1
    NET     HPROF_TEST           A_CALLS_B@1                    NET     HPROF_TEST           B_CALLS_A@1                         30066      30066      1
    NET     HPROF_TEST           A_CALLS_B                      NET     HPROF_TEST           B_CALLS_A                           57890      14161      1
    NET     TABLE_COUNT_TYPE     TABLE_COUNT_TYPE               NET     TABLE_COUNT_TYPE     __static_sql_exec_line6             54967      54967      1
    SYS     DBMS_OUTPUT          PUT_LINE                       SYS     DBMS_OUTPUT          NEW_LINE                                7          7      2
    SYS     DBMS_OUTPUT          PUT_LINE                       SYS     DBMS_OUTPUT          PUT                                    28         28      2
    SYS     DBMS_OUTPUT          GET_LINES                      SYS     DBMS_OUTPUT          GET_LINE                                8          8      3
                                 __anonymous_block              NET     HPROF_TEST           STOP_PROFILING                         21         21      1
                                 __anonymous_block              SYS     DBMS_OUTPUT          GET_LINES                              68         60      3
                                 __anonymous_block                                           __dyn_sql_exec_line12                 226        226      1
                                 __anonymous_block              NET     HPROF_TEST           R_CALLS_R                           12823       4159      1
                                 __anonymous_block                                           __static_sql_exec_line8             14418        260      1
                                 __plsql_vm                     NET     HPROF_TEST           DBFUNC                              18482      18482      1
                                 __anonymous_block                                           __sql_fetch_line13                 726713     726713      1
                                 __static_sql_exec_line8                                     __plsql_vm@1                        14158         11      1
                                 __plsql_vm                                                  __anonymous_block                  809839        521      5
                                 __plsql_vm@1                   NET     HPROF_TEST           DBFUNC                              14147      14147      1
                                 __anonymous_block              NET     TABLE_COUNT_TYPE     TABLE_COUNT_TYPE                    55049         82      1
    
    22 rows selected.

    The SUB_T and FUN_T values are the total times in microseconds for the subtree including function, and function-only processing, respectively, for the child function while called from all instances of the parent.

    Function Call Tree
    The raw data above can be used to identify processing bottlenecks at a function level, but it's also useful to process the data in order to display the function hierarchies, both for performance tuning and also for understanding the program structure. This is not quite as trivial as it may seem. The oracle-base article provides an SQL statement that attempts to do this:

    SELECT RPAD(' ', level*2, ' ') || fi.owner || '.' || fi.module AS name,
           fi.function,
           pci.subtree_elapsed_time,
           pci.function_elapsed_time,
           pci.calls
    FROM   dbmshp_parent_child_info pci
           JOIN dbmshp_function_info fi ON pci.runid = fi.runid AND pci.childsymid = fi.symbolid
    WHERE  pci.runid = :RUN_ID
    CONNECT BY PRIOR childsymid = parentsymid
    START WITH pci.parentsymid = :START_ID

    Here, bind variables replace the original hard-coded values. On running this query I often got the following result:

    ERROR at line 1:
    ORA-01436: CONNECT BY loop in user data

    On the run used in this article, the query returned 157 records, which is obviously incorrect. There is of course a NOCYCLE keyword that can be used to return results in the case of loops. However, it is not worth adding in this case, because there are in fact no loops in the data (at least no cyclic loops - apparent loops are discussed later). Oracle avoids loops by treating a function call that is a descendant of itself as a call to a new function, identified by suffices @1, @2 etc. as we can see from the recursive procedures above (eg R_CALLS_R@1 is the second call of R_CALLS_R, this one from itself). The problem here is that the query is incorrect in its handling of runid, with the result that the tree-walk traverses records from other runs as well as the intended one. A further problem is that there may be several roots, and it would be best to calculate these within a subquery. We can correct these problems by the following query:

    SELECT RPAD(' ', level*2, ' ') || fi.owner || '.' || fi.module AS name,
           fi.symbolid || ': ' || fi.function function,
           pci.subtree_elapsed_time sub_t,
           pci.function_elapsed_time fun_t,
           pci.calls
      FROM dbmshp_parent_child_info	pci
      JOIN dbmshp_function_info		fi 
        ON pci.runid	              = fi.runid 
       AND pci.childsymid	       = fi.symbolid
     WHERE pci.runid                   = :RUN_ID
    CONNECT BY PRIOR pci.childsymid    = pci.parentsymid 
           AND pci.runid	              = :RUN_ID
    START WITH pci.parentsymid         IN (SELECT f.symbolid FROM dbmshp_function_info f WHERE NOT EXISTS 
           (SELECT 1 FROM dbmshp_parent_child_info i WHERE i.childsymid = f.symbolid AND i.runid = :RUN_ID) AND f.runid = :RUN_ID)
           AND pci.runid	              = :RUN_ID

    This query returns the results:

    NAME                           FUNCTION                            SUB_T      FUN_T  CALLS
    ------------------------------ ------------------------------ ---------- ---------- ------
      .                            1: __anonymous_block              809,839        521      5
        NET.HPROF_TEST             9: R_CALLS_R                       12,823      4,159      1
          NET.HPROF_TEST           10: R_CALLS_R@1                     8,633      8,618      1
            SYS.DBMS_OUTPUT        18: PUT_LINE                           15          6      1
              SYS.DBMS_OUTPUT      16: NEW_LINE                            7          7      2
              SYS.DBMS_OUTPUT      17: PUT                                28         28      2
          SYS.DBMS_OUTPUT          18: PUT_LINE                           31          5      1
            SYS.DBMS_OUTPUT        16: NEW_LINE                            7          7      2
            SYS.DBMS_OUTPUT        17: PUT                                28         28      2
        NET.HPROF_TEST             11: STOP_PROFILING                     21         21      1
          SYS.DBMS_HPROF           13: STOP_PROFILING                      0          0      1
        NET.TABLE_COUNT_TYPE       12: TABLE_COUNT_TYPE               55,049         82      1
          NET.TABLE_COUNT_TYPE     22: __static_sql_exec_line6        54,967     54,967      1
        SYS.DBMS_OUTPUT            15: GET_LINES                          68         60      3
          SYS.DBMS_OUTPUT          14: GET_LINE                            8          8      3
        .                          19: __dyn_sql_exec_line12             226        226      1
        .                          20: __sql_fetch_line13            726,713    726,713      1
        .                          21: __static_sql_exec_line8        14,418        260      1
          .                        3: __plsql_vm@1                    14,158         11      1
            NET.HPROF_TEST         8: DBFUNC                          14,147     14,147      1
      NET.HPROF_TEST               8: DBFUNC                          18,482     18,482      1
      NET.HPROF_TEST               6: B_CALLS_A                       57,890     14,161      1
        NET.HPROF_TEST             5: A_CALLS_B@1                     43,729     13,663      1
          NET.HPROF_TEST           7: B_CALLS_A@1                     30,066     30,066      1
    
    24 rows selected.

    This is better, but we can identify some further issues.

    Missing Roots
    The true root results are missing: For example, A_CALLS_B is missing. This arises because the query is traversing the link records (DBMSHP_PARENT_CHILD_INFO), while the root information is stored in the nodes (DBMSHP_FUNCTION_INFO). This suggests a change from the CONNECT BY syntax to Oracle's v11.2 recursive subquery factoring syntax, which allows you easily to start from the nodes, then traverse recursively via the links. (Incidentally, moving the start of profiling to its own block would result in A_CALLS_B appearing under __anonymous_block, but I prefer to retain the current structure in order to deal with the general case in which multiple roots are possible.)

    Duplicate Links
    Notice that function PUT_LINE is reported separately under R_CALLS_R and R_CALLS_R@1, and the timings differ. Also, its own child calls appear under each of its instances, but in those cases the timings are identical. The reason for this is that in the first case, there are separate records of the times used in each call, whereas in the second, the child calls have only a single record giving the total times across both instances of the parent call. The call from R_CALLS_R shows (9 - 4 = ) 5µs used in child calls, while the call from R_CALLS_R@1 shows 14µs. The child calls show totals of (3 + 16 = ) 19µs, equalling the sum across the parent calls.

    At this point it is worth looking at this from the more general perspective of a hierarchical data structure where parents can have multiple children and children multiple parents, with one or more roots. If a network diagram were constructed there would be loops apparent indicating multiple routes between nodes. In these situations, Oracle's hierarchical queries effectively traverse all routes, and this is what causes the link duplication (in other scenarios this behaviour can cause big performance problems, but probably not here). Oracle's cycle detection mechanism does not trigger because the loops do not result in any node being a descendant of itself (as noted above, extra nodes are generated by the profiler to avoid this).

    It seems to me better to avoid this duplication, and also to signal those cases where times are not aggregated up the tree. We can achieve this by the use of analytic functions. Note that, although the query below refers to the specific tables and attributes for this problem, the proposed solution could be used for any member of this general class of problem. The new query, which orders sibling records by descending subtree elapsed time, is:

    WITH last_run AS (
    SELECT Max (runid) runid FROM dbmshp_runs
    ), full_tree (runid, lev, node_id, sub_t, fun_t, calls, link_id) AS (
    SELECT fni.runid, 0, fni.symbolid, fni.subtree_elapsed_time, fni.function_elapsed_time, fni.calls, 'root' || ROWNUM
      FROM dbmshp_function_info fni
      JOIN last_run lrn
        ON lrn.runid = fni.runid
     WHERE NOT EXISTS (SELECT 1 FROM dbmshp_parent_child_info pci WHERE pci.childsymid = fni.symbolid AND pci.runid = fni.runid)
     UNION ALL
    SELECT ftr.runid, 
           ftr.lev + 1, 
           pci.childsymid, 
           pci.subtree_elapsed_time, 
           pci.function_elapsed_time, 
           pci.calls,
           pci.parentsymid || '-' || pci.childsymid
      FROM full_tree ftr
      JOIN dbmshp_parent_child_info pci
        ON pci.parentsymid = ftr.node_id
       AND pci.runid = ftr.runid
    ) SEARCH DEPTH FIRST BY sub_t DESC, fun_t DESC, calls DESC, node_id SET rn
    , tree_ranked AS (
    SELECT runid, node_id, lev, rn, 
           sub_t, fun_t, calls, 
           Row_Number () OVER (PARTITION BY node_id ORDER BY rn) node_rn,
           Count (*) OVER (PARTITION BY node_id) node_cnt,
           Row_Number () OVER (PARTITION BY link_id ORDER BY rn) link_rn
      FROM full_tree
    )
    SELECT RPad (' ', trr.lev*2, ' ') || fni.function "Function tree",
           fni.symbolid sy, fni.owner, fni.module,
           CASE WHEN trr.node_cnt > 1 THEN trr.node_rn || ' of ' || trr.node_cnt END "Inst.",
           trr.sub_t, trr.fun_t, trr.calls, 
           trr.rn "Row"
      FROM tree_ranked trr
      JOIN dbmshp_function_info fni
        ON fni.symbolid = trr.node_id
       AND fni.runid = trr.runid
     WHERE trr.link_rn = 1
     ORDER BY trr.rn

    Query Structure Diagram
    HProf - QSD

    The results are then:

    Function tree                        SY OWNER MODULE               Inst.         SUB_T      FUN_T  CALLS  Row
    ----------------------------------- --- ----- -------------------- -------- ---------- ---------- ------ ----
    __plsql_vm                            2                                        828,379         58      6    1
      __anonymous_block                   1                                        809,839        521      5    2
        __sql_fetch_line13               20                                        726,713    726,713      1    3
        TABLE_COUNT_TYPE                 12 NET   TABLE_COUNT_TYPE                  55,049         82      1    4
          __static_sql_exec_line6        22 NET   TABLE_COUNT_TYPE                  54,967     54,967      1    5
        __static_sql_exec_line8          21                                         14,418        260      1    6
          __plsql_vm@1                    3                                         14,158         11      1    7
            DBFUNC                        8 NET   HPROF_TEST           1 of 2       14,147     14,147      1    8
        R_CALLS_R                         9 NET   HPROF_TEST                        12,823      4,159      1    9
          R_CALLS_R@1                    10 NET   HPROF_TEST                         8,633      8,618      1   10
            PUT_LINE                     18 SYS   DBMS_OUTPUT          1 of 2           15          6      1   11
              PUT                        17 SYS   DBMS_OUTPUT          1 of 2           28         28      2   12
              NEW_LINE                   16 SYS   DBMS_OUTPUT          1 of 2            7          7      2   13
          PUT_LINE                       18 SYS   DBMS_OUTPUT          2 of 2           31          5      1   14
        __dyn_sql_exec_line12            19                                            226        226      1   17
        GET_LINES                        15 SYS   DBMS_OUTPUT                           68         60      3   18
          GET_LINE                       14 SYS   DBMS_OUTPUT                            8          8      3   19
        STOP_PROFILING                   11 NET   HPROF_TEST                            21         21      1   20
          STOP_PROFILING                 13 SYS   DBMS_HPROF                             0          0      1   21
      DBFUNC                              8 NET   HPROF_TEST           2 of 2       18,482     18,482      1   22
    A_CALLS_B                             4 NET   HPROF_TEST                        62,340      4,450      1   23
      B_CALLS_A                           6 NET   HPROF_TEST                        57,890     14,161      1   24
        A_CALLS_B@1                       5 NET   HPROF_TEST                        43,729     13,663      1   25
          B_CALLS_A@1                     7 NET   HPROF_TEST                        30,066     30,066      1   26
    
    24 rows selected.

    Notice that we now have a single record for each of the 22 links, plus the two root nodes. Also, the "Inst." column lists the instance number of a function having more than one instance, and the children of any such function are only listed once with the gaps in the "Row" column indicating where duplicates have been suppressed.

    Network Diagrams
    It may be interesting to display the call tree in two diagrams, one for each root.
    Root __plsql_vm
    HProf - Net

    Root A_CALLS_B
    HProf - Net2

    Notes on Tree Output
    Anonymous Block (__anonymous_block)
    This function seems to correspond to invocations of anonymous blocks, obviously enough. However, there is an apparent anomaly in the number of calls listed, 6, because the driving program has only three such blocks, and there are none in the called PL/SQL code. I would surmise that the apparent discrepancy arises from the enabling of SERVEROUTPUT, which appears to result in a secondary block being associated with each explicit SQL*Plus block, that issues a call to GET_LINES to process buffered output.

    PL/SQL Engine (__plsql_vm)
    This function seems to correspond to external invocations of PL/SQL such as from a SQL*Plus session. There are 7 calls, 6 of them presumably being linked with the external anonymous blocks, and the seventh with DBFUNC, where a PL/SQL function is called from a SQL statement from SQL*Plus.

    Notice that the SQL statement calling a database function from within PL/SQL generates the recursive call to the engine, __plsql_vm@1

    Second Root (A_CALLS_B)
    The above function does not have the __plsql_vm/__anonymous_block ancestry that might be expected because profiling only started within the enclosing block.

    Inlined Procedure (Rest_a_While)
    I wrote a small procedure, Rest_a_While, to generate some elapsed time in the recursive procedures, but preceded it with the INLINE pragma, a new optimisation feature in 11g. This had the desired effect of removing the calls from the profiling output and including the times in the calling procedures. Rest_a_While does not make the obvious call to DBMS_Lock.Sleep because that procedure cannot be inlined. subprogram inlining in 11g provides some analysis of the inlining feature.

    Sibling Ordering
    We have ordered siblings by descending subtree elapsed time, using the SEARCH clause. It would be nice to have the option to order the siblings by initial invocation time, but Oracle does not provide the data to do this.

    Loops and Hierarchies
    The first diagram shows two loops, where there are two routes between the loop start and end points, indicated by different colours. The second loop has two child nodes coming from the end point, and hierarchical queries (both CONNECT BY and recursive subquery factors in Oracle) cause the links to be duplicated. Our query has filtered out the duplicates by analytic functions.

    It's worth remembering this because it's a general feature of SQL for querying hierarchies, and judging by Oracle forums, not one that's widely understood. For larger hierarchies it can cause serious performance problems, and may justify a PL/SQL programmed solution that need not suffer the same problem.

    Manual Instrumentation
    Oracle's hierarchical profiler clearly provides extremely useful information on both performance and structure of PL/SQL programs with very little effort. However, it does have the limitation of only providing information down to the subprogram level (which includes embedded SQL statements in this context). It is also often considered good practice to implement timing and other instrumentation permanently in production code, sometimes in a switchable fashion. In the test program, one of the called procedures, A_Calls_B, makes two calls to the inlined procedure, Rest_a_While, the second doing about twice as much work as the first. The profiler reports total within-function times of 4,450µs and 13,663µs on first and second calls, respectively (the work is scaled by a call number parameter, equal to 1, then 3).

    I created a second instance of the package and driver script (suffix _TS) to illustrate manual instrumentation. This uses an 'object-oriented' timing package that I wrote a couple of years ago Code Timing and Object Orientation and Zombies (November, 2010) to instrument at procedure and section level. I multiplied the work in Rest_a_While by a factor of ten to get larger times. This produced the output:

    Timer Set: HProf, Constructed at 05 Mar 2013 10:21:27, written at 10:21:30
    ==========================================================================
    [Timer timed: Elapsed (per call): 0.04 (0.000044), CPU (per call): 0.05 (0.000050), calls: 1000, '***' denotes corrected line below]
    
    Timer                       Elapsed          CPU          Calls        Ela/Call        CPU/Call
    ----------------------   ----------   ----------   ------------   -------------   -------------
    A_Calls_B, section one         0.06         0.05              2         0.03150         0.02500
    A_Calls_B, section two         0.12         0.12              2         0.06050         0.06000
    B_Calls_A: 2                   0.15         0.16              1         0.15400         0.16000
    B_Calls_A: 4                   0.31         0.30              1         0.30700         0.30000
    DBFunc                         0.32         0.31              2         0.15950         0.15500
    Open cursor                    0.69         0.69              1         0.68900         0.69000
    Fetch from cursor              0.70         0.70              1         0.69600         0.70000
    Close cursor                   0.00         0.00              1         0.00000         0.00000
    Construct object               0.06         0.04              1         0.05500         0.04000
    R_Calls_R                      0.14         0.14              2         0.07000         0.07000
    (Other)                        0.00         0.00              1         0.00000         0.00000
    ----------------------   ----------   ----------   ------------   -------------   -------------
    Total                          2.54         2.51             15         0.16960         0.16733
    ----------------------   ----------   ----------   ------------   -------------   -------------
    

    Notes on Code Timing

    • Calls, CPU and elapsed times have been captured at the section level for A_Calls_B
    • Observe that, while R_Calls_R and A_Calls_B aggregate over all calls, B_Calls_A records values by call; this is implemented simply by including a value that changes with call in the timer name
    • The timing set object is designed to be very low footprint; here 9 statements (calls to Increment_Time), plus a small global overhead, produced 10 result lines, plus associated information
    • The 'object-oriented' approach allows multiple programs to be be timed at multiple levels, without interference between timings
    • There are Perl and Java implementations of this timing set object included in the Scribd article mentioned

    Oracle's Flat Profiler (DBMS_Profiler)
    The hierarchical profiler was introduced in v11.1, while prior to this there was a non-hierarchical profiler, DBMS_Profiler. This package still exists in v11: It is omitted from the advanced application developer's guide for v11, but is described in the packages and types manual (Oracle® Database PL/SQL Packages and Types Reference, 11g Release 2 (11.2)); also, SQL*Developer appears to support only the newer hierarchical verion (via right-click on a package). I thought it interesting to run the older version on the same test program (package Old_Test_Prof, driver script Test_Rep_p_Old.sql and reporting script Test_Rep_h_Old.sql). The output from the first three queries is:

    Run header (PLSQL_PROFILER_RUNS)
    
         RUNID RUN_DATE        MICRO_S    SECONDS
    ---------- ------------ ---------- ----------
             3 11:03:13        2164000       2.16
    
    Profiler data summary (PLSQL_PROFILER_DATA)
    
       MICRO_S SECONDS    CALLS
    ---------- ------- --------
       2126949    2.13       72
    
    Profiler data by time (PLSQL_PROFILER_DATA)
    
       MICRO_S SECONDS    CALLS UNIT_NAME            UNIT_NUMBER  LINE#
    ---------- ------- -------- -------------------- ----------- ------
        729932    0.73        1                                5     13
        569563    0.57        2 OLD_PROF_TEST                  1     56
        377880    0.38        2 OLD_PROF_TEST                  1     82
        166019    0.17        2 OLD_PROF_TEST                  1     70
        150117    0.15        2 OLD_PROF_TEST                  1     43
         72742    0.07        2 OLD_PROF_TEST                  1     40
         56473    0.06        1 TABLE_COUNT_TYPE               6      6
          3338    0.00        1                                5      8
           258    0.00        1                                5     12
           109    0.00        1                                5     16
            68    0.00        2 OLD_PROF_TEST                  1     67
            66    0.00        2                                4      1
            60    0.00        2                                7      1
            60    0.00        2                                3      1
            44    0.00        1                                5     14
            42    0.00        0                                2      5
            31    0.00        1 OLD_PROF_TEST                  1     18
            26    0.00        1                                8      5
            13    0.00        0                                5      1
             9    0.00        1 TABLE_COUNT_TYPE               6     11
             9    0.00        2 OLD_PROF_TEST                  1     86
             8    0.00        0 OLD_PROF_TEST                  1     51
             8    0.00        1 TABLE_COUNT_TYPE               6     13
             7    0.00        1                                5     18
             6    0.00        0 OLD_PROF_TEST                  1     78
             6    0.00        0 OLD_PROF_TEST                  1     64
             6    0.00        0                                8      1
             6    0.00        1 TABLE_COUNT_TYPE               6      3
             5    0.00        0 OLD_PROF_TEST                  1     35
             5    0.00        0 OLD_PROF_TEST                  1     15
             4    0.00        1                                8      7
             4    0.00        1 OLD_PROF_TEST                  1     76
             3    0.00        1                                2      8
             2    0.00        1 OLD_PROF_TEST                  1     62
             2    0.00        1 OLD_PROF_TEST                  1     13
             2    0.00        1 TABLE_COUNT_TYPE               6      5
             2    0.00        2 OLD_PROF_TEST                  1     72
             2    0.00        2 OLD_PROF_TEST                  1     45
             2    0.00        2 OLD_PROF_TEST                  1     49
             2    0.00        2 OLD_PROF_TEST                  1     46
             2    0.00        2 OLD_PROF_TEST                  1     58
             1    0.00        1 OLD_PROF_TEST                  1     73
             1    0.00        1                                2      6
             1    0.00        1 OLD_PROF_TEST                  1     59
             1    0.00        1 OLD_PROF_TEST                  1     11
             1    0.00        2 OLD_PROF_TEST                  1     54
             1    0.00        2 OLD_PROF_TEST                  1     84
             0    0.00        0 OLD_PROF_TEST                  1      1
             0    0.00        0 OLD_PROF_TEST                  1     88
             0    0.00        0                                8      9
             0    0.00        0                                2      1
             0    0.00        0                                2      2
             0    0.00        0 OLD_PROF_TEST                  1      3
             0    0.00        0 OLD_PROF_TEST                  1      5
             0    0.00        0 OLD_PROF_TEST                  1      9
             0    0.00        0 OLD_PROF_TEST                  1     20
             0    0.00        1 TABLE_COUNT_TYPE               6      4
             0    0.00        1                                8      2
             0    0.00        2 OLD_PROF_TEST                  1     39
             0    0.00        2 OLD_PROF_TEST                  1     55
             0    0.00        2 OLD_PROF_TEST                  1     69
             0    0.00        2 OLD_PROF_TEST                  1     38
             0    0.00        2 OLD_PROF_TEST                  1     81
             0    0.00        2 OLD_PROF_TEST                  1     42
             0    0.00        2 OLD_PROF_TEST                  1     68
    
    65 rows selected.
    
    

    Referring to the package, type and anonymous blocks, I assigned labels to all the lines having more than 10µs, as follows:

       MICRO_S SECONDS    CALLS UNIT_NAME            UNIT_NUMBER  LINE#
    ---------- ------- -------- -------------------- ----------- ------
        729932    0.73        1                                5     13  B2: FETCH
        569563    0.57        2 OLD_PROF_TEST                  1     56  B_Calls_A (Rest_a_While)
        377880    0.38        2 OLD_PROF_TEST                  1     82  DBFunc (Rest_a_While)
        166019    0.17        2 OLD_PROF_TEST                  1     70  R_Calls_R (Rest_a_While)
        150117    0.15        2 OLD_PROF_TEST                  1     43  A_Calls_B (Rest_a_While, section 2)
         72742    0.07        2 OLD_PROF_TEST                  1     40  A_Calls_B (Rest_a_While, section 1)
         56473    0.06        1 TABLE_COUNT_TYPE               6      6  SELECT
          3338    0.00        1                                5      8  B2: SELECT DBFunc
           258    0.00        1                                5     12  B2: OPEN
           109    0.00        1                                5     16  B2: Assign Table_Count_Type
            68    0.00        2 OLD_PROF_TEST                  1     67  Put_Line
            66    0.00        2                                4      1  Auxiliary SERVEROUTPUT block for B2 (surmised)
            60    0.00        2                                7      1  Auxiliary SERVEROUTPUT block for B3 (surmised)
            60    0.00        2                                3      1  Auxiliary SERVEROUTPUT block for B1 (surmised)
            44    0.00        1                                5     14  B2: CLOSE
            42    0.00        0                                2      5  B1: Call to Start_Profiling 
            31    0.00        1 OLD_PROF_TEST                  1     18  RETURN DBMS_Profiler.Stop_Profiler;
            26    0.00        1                                8      5  B3: Call R_Calls_R
            13    0.00        0                                5      1  B2: DECLARE
    

    Notes on Output of Flat Profiler
    There were six units with no linked information in DBMS_PROFILER_UNITS. By examining the data, I was able to associate unit numbers 2, 5 and 8 with my anonymous blocks B1, B2 and B3. That left three unassigned, and I have surmised that these correspond to the auxiliary blocks associated with processing server output that we earlier surmised when examining the output from the hierarchical profiler.

    • The useful call tree structure is not present in the data from the old profiler
    • However, the results are at a line level, which the hierarchical profiler does not provide; for example, the two sections of A_Calls_B are reported separately
    • Deciphering the output requires significantly more manual effort than with the hierarchical profiler
    • Both old and new profiler have their own advantages, and so both should be considered of value
    • Manual code timing offers more flexibility in terms of aggregating lines and call instances, but requires more effort

    Conclusions

    • Running Oracle's hierarchical profiler would seem to be the default first step in tuning PL/SQL programs from v11.1
    • Some care is needed in interpreting the output data; I've provided a query for displaying the hierarchies
    • Performance is recorded only down to function level, so it will still often be worthwhile to use the old flat profiler in addition
    • Manually timing code sections also still has a part to play, in terms of instrumentation and greater flexibility where necessary






  • Database function called from SQL in SQL*Plus (DBFUNC)
  • Database function called from SQL in PL/SQL (DBFUNC)
  • Object constructor call (TABLE_COUNT_TYPE)

Call Structure Diagram
HProf - CSD

Raw Results
The attached script Test_Rep_h.sql was used to report on the results. The record produced in the run table, DBMSHP_RUNS, was:

     RUNID RUN_TIMESTAMP                   MICRO_S    SECONDS RUN_COMMENT
---------- ---------------------------- ---------- ---------- ------------------------------------------------------------
        11 04-MAR-13 07.07.36.803000        890719        .89 Profile for small test program with recursion

The records produced in the functions table, DBMSHP_FUNCTION_INFO, were:

OWNER MODULE               FUNCTION                         ID  LINE#      SUB_T      FUN_T  CALLS
----- -------------------- ------------------------------ ---- ------ ---------- ---------- ------
NET   HPROF_TEST           A_CALLS_B                         4     40      62340       4450      1
NET   HPROF_TEST           A_CALLS_B@1                       5     40      43729      13663      1
NET   HPROF_TEST           B_CALLS_A                         6     38      57890      14161      1
NET   HPROF_TEST           B_CALLS_A@1                       7     38      30066      30066      1
NET   HPROF_TEST           DBFUNC                            8     84      32629      32629      2
NET   HPROF_TEST           R_CALLS_R                         9     70      12823       4159      1
NET   HPROF_TEST           R_CALLS_R@1                      10     70       8633       8618      1
NET   HPROF_TEST           STOP_PROFILING                   11     16         21         21      1
NET   TABLE_COUNT_TYPE     TABLE_COUNT_TYPE                 12      3      55049         82      1
NET   TABLE_COUNT_TYPE     __static_sql_exec_line6          22      6      54967      54967      1
SYS   DBMS_HPROF           STOP_PROFILING                   13     59          0          0      1
SYS   DBMS_OUTPUT          GET_LINE                         14    129          8          8      3
SYS   DBMS_OUTPUT          GET_LINES                        15    160         68         60      3
SYS   DBMS_OUTPUT          NEW_LINE                         16    117          7          7      2
SYS   DBMS_OUTPUT          PUT                              17     77         28         28      2
SYS   DBMS_OUTPUT          PUT_LINE                         18    109         46         11      2
                           __anonymous_block                 1      0     809839        521      5
                           __dyn_sql_exec_line12            19     12        226        226      1
                           __plsql_vm                        2      0     828379         58      6
                           __plsql_vm@1                      3      0      14158         11      1
                           __sql_fetch_line13               20     13     726713     726713      1
                           __static_sql_exec_line8          21      8      14418        260      1

22 rows selected.

The SUB_T and FUN_T values are the total times in microseconds for the subtree including function, and function-only processing.

The records produced in the functions parent-child table, DBMSHP_PARENT_CHILD_INFO, were:

OWNER_P MODULE_P             FUNCTION_P                     OWNER_C MODULE_C             FUNCTION_C                          SUB_T      FUN_T  CALLS
------- -------------------- ------------------------------ ------- -------------------- ------------------------------ ---------- ---------- ------
NET     HPROF_TEST           STOP_PROFILING                 SYS     DBMS_HPROF           STOP_PROFILING                          0          0      1
NET     HPROF_TEST           R_CALLS_R@1                    SYS     DBMS_OUTPUT          PUT_LINE                               15          6      1
NET     HPROF_TEST           R_CALLS_R                      SYS     DBMS_OUTPUT          PUT_LINE                               31          5      1
NET     HPROF_TEST           R_CALLS_R                      NET     HPROF_TEST           R_CALLS_R@1                          8633       8618      1
NET     HPROF_TEST           B_CALLS_A                      NET     HPROF_TEST           A_CALLS_B@1                         43729      13663      1
NET     HPROF_TEST           A_CALLS_B@1                    NET     HPROF_TEST           B_CALLS_A@1                         30066      30066      1
NET     HPROF_TEST           A_CALLS_B                      NET     HPROF_TEST           B_CALLS_A                           57890      14161      1
NET     TABLE_COUNT_TYPE     TABLE_COUNT_TYPE               NET     TABLE_COUNT_TYPE     __static_sql_exec_line6             54967      54967      1
SYS     DBMS_OUTPUT          PUT_LINE                       SYS     DBMS_OUTPUT          NEW_LINE                                7          7      2
SYS     DBMS_OUTPUT          PUT_LINE                       SYS     DBMS_OUTPUT          PUT                                    28         28      2
SYS     DBMS_OUTPUT          GET_LINES                      SYS     DBMS_OUTPUT          GET_LINE                                8          8      3
                             __anonymous_block              NET     HPROF_TEST           STOP_PROFILING                         21         21      1
                             __anonymous_block              SYS     DBMS_OUTPUT          GET_LINES                              68         60      3
                             __anonymous_block                                           __dyn_sql_exec_line12                 226        226      1
                             __anonymous_block              NET     HPROF_TEST           R_CALLS_R                           12823       4159      1
                             __anonymous_block                                           __static_sql_exec_line8             14418        260      1
                             __plsql_vm                     NET     HPROF_TEST           DBFUNC                              18482      18482      1
                             __anonymous_block                                           __sql_fetch_line13                 726713     726713      1
                             __static_sql_exec_line8                                     __plsql_vm@1                        14158         11      1
                             __plsql_vm                                                  __anonymous_block                  809839        521      5
                             __plsql_vm@1                   NET     HPROF_TEST           DBFUNC                              14147      14147      1
                             __anonymous_block              NET     TABLE_COUNT_TYPE     TABLE_COUNT_TYPE                    55049         82      1

22 rows selected.

The SUB_T and FUN_T values are the total times in microseconds for the subtree including function, and function-only processing, respectively, for the child function while called from all instances of the parent.

Function Call Tree
The raw data above can be used to identify processing bottlenecks at a function level, but it's also useful to process the data in order to display the function hierarchies, both for performance tuning and also for understanding the program structure. This is not quite as trivial as it may seem. The oracle-base article provides an SQL statement that attempts to do this:

SELECT RPAD(' ', level*2, ' ') || fi.owner || '.' || fi.module AS name,
       fi.function,
       pci.subtree_elapsed_time,
       pci.function_elapsed_time,
       pci.calls
FROM   dbmshp_parent_child_info pci
       JOIN dbmshp_function_info fi ON pci.runid = fi.runid AND pci.childsymid = fi.symbolid
WHERE  pci.runid = :RUN_ID
CONNECT BY PRIOR childsymid = parentsymid
START WITH pci.parentsymid = :START_ID

Here, bind variables replace the original hard-coded values. On running this query I often got the following result:

ERROR at line 1:
ORA-01436: CONNECT BY loop in user data

On the run used in this article, the query returned 157 records, which is obviously incorrect. There is of course a NOCYCLE keyword that can be used to return results in the case of loops. However, it is not worth adding in this case, because there are in fact no loops in the data (at least no cyclic loops - apparent loops are discussed later). Oracle avoids loops by treating a function call that is a descendant of itself as a call to a new function, identified by suffices @1, @2 etc. as we can see from the recursive procedures above (eg R_CALLS_R@1 is the second call of R_CALLS_R, this one from itself). The problem here is that the query is incorrect in its handling of runid, with the result that the tree-walk traverses records from other runs as well as the intended one. A further problem is that there may be several roots, and it would be best to calculate these within a subquery. We can correct these problems by the following query:

SELECT RPAD(' ', level*2, ' ') || fi.owner || '.' || fi.module AS name,
       fi.symbolid || ': ' || fi.function function,
       pci.subtree_elapsed_time sub_t,
       pci.function_elapsed_time fun_t,
       pci.calls
  FROM dbmshp_parent_child_info	pci
  JOIN dbmshp_function_info		fi 
    ON pci.runid	              = fi.runid 
   AND pci.childsymid	       = fi.symbolid
 WHERE pci.runid                   = :RUN_ID
CONNECT BY PRIOR pci.childsymid    = pci.parentsymid 
       AND pci.runid	              = :RUN_ID
START WITH pci.parentsymid         IN (SELECT f.symbolid FROM dbmshp_function_info f WHERE NOT EXISTS 
       (SELECT 1 FROM dbmshp_parent_child_info i WHERE i.childsymid = f.symbolid AND i.runid = :RUN_ID) AND f.runid = :RUN_ID)
       AND pci.runid	              = :RUN_ID

This query returns the results:

NAME                           FUNCTION                            SUB_T      FUN_T  CALLS
------------------------------ ------------------------------ ---------- ---------- ------
  .                            1: __anonymous_block              809,839        521      5
    NET.HPROF_TEST             9: R_CALLS_R                       12,823      4,159      1
      NET.HPROF_TEST           10: R_CALLS_R@1                     8,633      8,618      1
        SYS.DBMS_OUTPUT        18: PUT_LINE                           15          6      1
          SYS.DBMS_OUTPUT      16: NEW_LINE                            7          7      2
          SYS.DBMS_OUTPUT      17: PUT                                28         28      2
      SYS.DBMS_OUTPUT          18: PUT_LINE                           31          5      1
        SYS.DBMS_OUTPUT        16: NEW_LINE                            7          7      2
        SYS.DBMS_OUTPUT        17: PUT                                28         28      2
    NET.HPROF_TEST             11: STOP_PROFILING                     21         21      1
      SYS.DBMS_HPROF           13: STOP_PROFILING                      0          0      1
    NET.TABLE_COUNT_TYPE       12: TABLE_COUNT_TYPE               55,049         82      1
      NET.TABLE_COUNT_TYPE     22: __static_sql_exec_line6        54,967     54,967      1
    SYS.DBMS_OUTPUT            15: GET_LINES                          68         60      3
      SYS.DBMS_OUTPUT          14: GET_LINE                            8          8      3
    .                          19: __dyn_sql_exec_line12             226        226      1
    .                          20: __sql_fetch_line13            726,713    726,713      1
    .                          21: __static_sql_exec_line8        14,418        260      1
      .                        3: __plsql_vm@1                    14,158         11      1
        NET.HPROF_TEST         8: DBFUNC                          14,147     14,147      1
  NET.HPROF_TEST               8: DBFUNC                          18,482     18,482      1
  NET.HPROF_TEST               6: B_CALLS_A                       57,890     14,161      1
    NET.HPROF_TEST             5: A_CALLS_B@1                     43,729     13,663      1
      NET.HPROF_TEST           7: B_CALLS_A@1                     30,066     30,066      1

24 rows selected.

This is better, but we can identify some further issues.

Missing Roots
The true root results are missing: For example, A_CALLS_B is missing. This arises because the query is traversing the link records (DBMSHP_PARENT_CHILD_INFO), while the root information is stored in the nodes (DBMSHP_FUNCTION_INFO). This suggests a change from the CONNECT BY syntax to Oracle's v11.2 recursive subquery factoring syntax, which allows you easily to start from the nodes, then traverse recursively via the links. (Incidentally, moving the start of profiling to its own block would result in A_CALLS_B appearing under __anonymous_block, but I prefer to retain the current structure in order to deal with the general case in which multiple roots are possible.)

Duplicate Links
Notice that function PUT_LINE is reported separately under R_CALLS_R and R_CALLS_R@1, and the timings differ. Also, its own child calls appear under each of its instances, but in those cases the timings are identical. The reason for this is that in the first case, there are separate records of the times used in each call, whereas in the second, the child calls have only a single record giving the total times across both instances of the parent call. The call from R_CALLS_R shows (9 - 4 = ) 5µs used in child calls, while the call from R_CALLS_R@1 shows 14µs. The child calls show totals of (3 + 16 = ) 19µs, equalling the sum across the parent calls.

At this point it is worth looking at this from the more general perspective of a hierarchical data structure where parents can have multiple children and children multiple parents, with one or more roots. If a network diagram were constructed there would be loops apparent indicating multiple routes between nodes. In these situations, Oracle's hierarchical queries effectively traverse all routes, and this is what causes the link duplication (in other scenarios this behaviour can cause big performance problems, but probably not here). Oracle's cycle detection mechanism does not trigger because the loops do not result in any node being a descendant of itself (as noted above, extra nodes are generated by the profiler to avoid this).

It seems to me better to avoid this duplication, and also to signal those cases where times are not aggregated up the tree. We can achieve this by the use of analytic functions. Note that, although the query below refers to the specific tables and attributes for this problem, the proposed solution could be used for any member of this general class of problem. The new query, which orders sibling records by descending subtree elapsed time, is:

WITH last_run AS (
SELECT Max (runid) runid FROM dbmshp_runs
), full_tree (runid, lev, node_id, sub_t, fun_t, calls, link_id) AS (
SELECT fni.runid, 0, fni.symbolid, fni.subtree_elapsed_time, fni.function_elapsed_time, fni.calls, 'root' || ROWNUM
  FROM dbmshp_function_info fni
  JOIN last_run lrn
    ON lrn.runid = fni.runid
 WHERE NOT EXISTS (SELECT 1 FROM dbmshp_parent_child_info pci WHERE pci.childsymid = fni.symbolid AND pci.runid = fni.runid)
 UNION ALL
SELECT ftr.runid, 
       ftr.lev + 1, 
       pci.childsymid, 
       pci.subtree_elapsed_time, 
       pci.function_elapsed_time, 
       pci.calls,
       pci.parentsymid || '-' || pci.childsymid
  FROM full_tree ftr
  JOIN dbmshp_parent_child_info pci
    ON pci.parentsymid = ftr.node_id
   AND pci.runid = ftr.runid
) SEARCH DEPTH FIRST BY sub_t DESC, fun_t DESC, calls DESC, node_id SET rn
, tree_ranked AS (
SELECT runid, node_id, lev, rn, 
       sub_t, fun_t, calls, 
       Row_Number () OVER (PARTITION BY node_id ORDER BY rn) node_rn,
       Count (*) OVER (PARTITION BY node_id) node_cnt,
       Row_Number () OVER (PARTITION BY link_id ORDER BY rn) link_rn
  FROM full_tree
)
SELECT RPad (' ', trr.lev*2, ' ') || fni.function "Function tree",
       fni.symbolid sy, fni.owner, fni.module,
       CASE WHEN trr.node_cnt > 1 THEN trr.node_rn || ' of ' || trr.node_cnt END "Inst.",
       trr.sub_t, trr.fun_t, trr.calls, 
       trr.rn "Row"
  FROM tree_ranked trr
  JOIN dbmshp_function_info fni
    ON fni.symbolid = trr.node_id
   AND fni.runid = trr.runid
 WHERE trr.link_rn = 1
 ORDER BY trr.rn

Query Structure Diagram
HProf - QSD

The results are then:

Function tree                        SY OWNER MODULE               Inst.         SUB_T      FUN_T  CALLS  Row
----------------------------------- --- ----- -------------------- -------- ---------- ---------- ------ ----
__plsql_vm                            2                                        828,379         58      6    1
  __anonymous_block                   1                                        809,839        521      5    2
    __sql_fetch_line13               20                                        726,713    726,713      1    3
    TABLE_COUNT_TYPE                 12 NET   TABLE_COUNT_TYPE                  55,049         82      1    4
      __static_sql_exec_line6        22 NET   TABLE_COUNT_TYPE                  54,967     54,967      1    5
    __static_sql_exec_line8          21                                         14,418        260      1    6
      __plsql_vm@1                    3                                         14,158         11      1    7
        DBFUNC                        8 NET   HPROF_TEST           1 of 2       14,147     14,147      1    8
    R_CALLS_R                         9 NET   HPROF_TEST                        12,823      4,159      1    9
      R_CALLS_R@1                    10 NET   HPROF_TEST                         8,633      8,618      1   10
        PUT_LINE                     18 SYS   DBMS_OUTPUT          1 of 2           15          6      1   11
          PUT                        17 SYS   DBMS_OUTPUT          1 of 2           28         28      2   12
          NEW_LINE                   16 SYS   DBMS_OUTPUT          1 of 2            7          7      2   13
      PUT_LINE                       18 SYS   DBMS_OUTPUT          2 of 2           31          5      1   14
    __dyn_sql_exec_line12            19                                            226        226      1   17
    GET_LINES                        15 SYS   DBMS_OUTPUT                           68         60      3   18
      GET_LINE                       14 SYS   DBMS_OUTPUT                            8          8      3   19
    STOP_PROFILING                   11 NET   HPROF_TEST                            21         21      1   20
      STOP_PROFILING                 13 SYS   DBMS_HPROF                             0          0      1   21
  DBFUNC                              8 NET   HPROF_TEST           2 of 2       18,482     18,482      1   22
A_CALLS_B                             4 NET   HPROF_TEST                        62,340      4,450      1   23
  B_CALLS_A                           6 NET   HPROF_TEST                        57,890     14,161      1   24
    A_CALLS_B@1                       5 NET   HPROF_TEST                        43,729     13,663      1   25
      B_CALLS_A@1                     7 NET   HPROF_TEST                        30,066     30,066      1   26

24 rows selected.

Notice that we now have a single record for each of the 22 links, plus the two root nodes. Also, the "Inst." column lists the instance number of a function having more than one instance, and the children of any such function are only listed once with the gaps in the "Row" column indicating where duplicates have been suppressed.

Network Diagrams
It may be interesting to display the call tree in two diagrams, one for each root.
Root __plsql_vm
HProf - Net

Root A_CALLS_B
HProf - Net2

Notes on Tree Output
Anonymous Block (__anonymous_block)
This function seems to correspond to invocations of anonymous blocks, obviously enough. However, there is an apparent anomaly in the number of calls listed, 6, because the driving program has only three such blocks, and there are none in the called PL/SQL code. I would surmise that the apparent discrepancy arises from the enabling of SERVEROUTPUT, which appears to result in a secondary block being associated with each explicit SQL*Plus block, that issues a call to GET_LINES to process buffered output.

PL/SQL Engine (__plsql_vm)
This function seems to correspond to external invocations of PL/SQL such as from a SQL*Plus session. There are 7 calls, 6 of them presumably being linked with the external anonymous blocks, and the seventh with DBFUNC, where a PL/SQL function is called from a SQL statement from SQL*Plus.

Notice that the SQL statement calling a database function from within PL/SQL generates the recursive call to the engine, __plsql_vm@1

Second Root (A_CALLS_B)
The above function does not have the __plsql_vm/__anonymous_block ancestry that might be expected because profiling only started within the enclosing block.

Inlined Procedure (Rest_a_While)
I wrote a small procedure, Rest_a_While, to generate some elapsed time in the recursive procedures, but preceded it with the INLINE pragma, a new optimisation feature in 11g. This had the desired effect of removing the calls from the profiling output and including the times in the calling procedures. Rest_a_While does not make the obvious call to DBMS_Lock.Sleep because that procedure cannot be inlined. subprogram inlining in 11g provides some analysis of the inlining feature.

Sibling Ordering
We have ordered siblings by descending subtree elapsed time, using the SEARCH clause. It would be nice to have the option to order the siblings by initial invocation time, but Oracle does not provide the data to do this.

Loops and Hierarchies
The first diagram shows two loops, where there are two routes between the loop start and end points, indicated by different colours. The second loop has two child nodes coming from the end point, and hierarchical queries (both CONNECT BY and recursive subquery factors in Oracle) cause the links to be duplicated. Our query has filtered out the duplicates by analytic functions.

It's worth remembering this because it's a general feature of SQL for querying hierarchies, and judging by Oracle forums, not one that's widely understood. For larger hierarchies it can cause serious performance problems, and may justify a PL/SQL programmed solution that need not suffer the same problem.

Manual Instrumentation
Oracle's hierarchical profiler clearly provides extremely useful information on both performance and structure of PL/SQL programs with very little effort. However, it does have the limitation of only providing information down to the subprogram level (which includes embedded SQL statements in this context). It is also often considered good practice to implement timing and other instrumentation permanently in production code, sometimes in a switchable fashion. In the test program, one of the called procedures, A_Calls_B, makes two calls to the inlined procedure, Rest_a_While, the second doing about twice as much work as the first. The profiler reports total within-function times of 4,450µs and 13,663µs on first and second calls, respectively (the work is scaled by a call number parameter, equal to 1, then 3).

I created a second instance of the package and driver script (suffix _TS) to illustrate manual instrumentation. This uses an 'object-oriented' timing package that I wrote a couple of years ago Code Timing and Object Orientation and Zombies (November, 2010) to instrument at procedure and section level. I multiplied the work in Rest_a_While by a factor of ten to get larger times. This produced the output:

Timer Set: HProf, Constructed at 05 Mar 2013 10:21:27, written at 10:21:30
==========================================================================
[Timer timed: Elapsed (per call): 0.04 (0.000044), CPU (per call): 0.05 (0.000050), calls: 1000, '***' denotes corrected line below]

Timer                       Elapsed          CPU          Calls        Ela/Call        CPU/Call
----------------------   ----------   ----------   ------------   -------------   -------------
A_Calls_B, section one         0.06         0.05              2         0.03150         0.02500
A_Calls_B, section two         0.12         0.12              2         0.06050         0.06000
B_Calls_A: 2                   0.15         0.16              1         0.15400         0.16000
B_Calls_A: 4                   0.31         0.30              1         0.30700         0.30000
DBFunc                         0.32         0.31              2         0.15950         0.15500
Open cursor                    0.69         0.69              1         0.68900         0.69000
Fetch from cursor              0.70         0.70              1         0.69600         0.70000
Close cursor                   0.00         0.00              1         0.00000         0.00000
Construct object               0.06         0.04              1         0.05500         0.04000
R_Calls_R                      0.14         0.14              2         0.07000         0.07000
(Other)                        0.00         0.00              1         0.00000         0.00000
----------------------   ----------   ----------   ------------   -------------   -------------
Total                          2.54         2.51             15         0.16960         0.16733
----------------------   ----------   ----------   ------------   -------------   -------------

Notes on Code Timing

  • Calls, CPU and elapsed times have been captured at the section level for A_Calls_B
  • Observe that, while R_Calls_R and A_Calls_B aggregate over all calls, B_Calls_A records values by call; this is implemented simply by including a value that changes with call in the timer name
  • The timing set object is designed to be very low footprint; here 9 statements (calls to Increment_Time), plus a small global overhead, produced 10 result lines, plus associated information
  • The 'object-oriented' approach allows multiple programs to be be timed at multiple levels, without interference between timings
  • There are Perl and Java implementations of this timing set object included in the Scribd article mentioned

Oracle's Flat Profiler (DBMS_Profiler)
The hierarchical profiler was introduced in v11.1, while prior to this there was a non-hierarchical profiler, DBMS_Profiler. This package still exists in v11: It is omitted from the advanced application developer's guide for v11, but is described in the packages and types manual (Oracle® Database PL/SQL Packages and Types Reference, 11g Release 2 (11.2)); also, SQL*Developer appears to support only the newer hierarchical verion (via right-click on a package). I thought it interesting to run the older version on the same test program (package Old_Test_Prof, driver script Test_Rep_p_Old.sql and reporting script Test_Rep_h_Old.sql). The output from the first three queries is:

Run header (PLSQL_PROFILER_RUNS)

     RUNID RUN_DATE        MICRO_S    SECONDS
---------- ------------ ---------- ----------
         3 11:03:13        2164000       2.16

Profiler data summary (PLSQL_PROFILER_DATA)

   MICRO_S SECONDS    CALLS
---------- ------- --------
   2126949    2.13       72

Profiler data by time (PLSQL_PROFILER_DATA)

   MICRO_S SECONDS    CALLS UNIT_NAME            UNIT_NUMBER  LINE#
---------- ------- -------- -------------------- ----------- ------
    729932    0.73        1                                5     13
    569563    0.57        2 OLD_PROF_TEST                  1     56
    377880    0.38        2 OLD_PROF_TEST                  1     82
    166019    0.17        2 OLD_PROF_TEST                  1     70
    150117    0.15        2 OLD_PROF_TEST                  1     43
     72742    0.07        2 OLD_PROF_TEST                  1     40
     56473    0.06        1 TABLE_COUNT_TYPE               6      6
      3338    0.00        1                                5      8
       258    0.00        1                                5     12
       109    0.00        1                                5     16
        68    0.00        2 OLD_PROF_TEST                  1     67
        66    0.00        2                                4      1
        60    0.00        2                                7      1
        60    0.00        2                                3      1
        44    0.00        1                                5     14
        42    0.00        0                                2      5
        31    0.00        1 OLD_PROF_TEST                  1     18
        26    0.00        1                                8      5
        13    0.00        0                                5      1
         9    0.00        1 TABLE_COUNT_TYPE               6     11
         9    0.00        2 OLD_PROF_TEST                  1     86
         8    0.00        0 OLD_PROF_TEST                  1     51
         8    0.00        1 TABLE_COUNT_TYPE               6     13
         7    0.00        1                                5     18
         6    0.00        0 OLD_PROF_TEST                  1     78
         6    0.00        0 OLD_PROF_TEST                  1     64
         6    0.00        0                                8      1
         6    0.00        1 TABLE_COUNT_TYPE               6      3
         5    0.00        0 OLD_PROF_TEST                  1     35
         5    0.00        0 OLD_PROF_TEST                  1     15
         4    0.00        1                                8      7
         4    0.00        1 OLD_PROF_TEST                  1     76
         3    0.00        1                                2      8
         2    0.00        1 OLD_PROF_TEST                  1     62
         2    0.00        1 OLD_PROF_TEST                  1     13
         2    0.00        1 TABLE_COUNT_TYPE               6      5
         2    0.00        2 OLD_PROF_TEST                  1     72
         2    0.00        2 OLD_PROF_TEST                  1     45
         2    0.00        2 OLD_PROF_TEST                  1     49
         2    0.00        2 OLD_PROF_TEST                  1     46
         2    0.00        2 OLD_PROF_TEST                  1     58
         1    0.00        1 OLD_PROF_TEST                  1     73
         1    0.00        1                                2      6
         1    0.00        1 OLD_PROF_TEST                  1     59
         1    0.00        1 OLD_PROF_TEST                  1     11
         1    0.00        2 OLD_PROF_TEST                  1     54
         1    0.00        2 OLD_PROF_TEST                  1     84
         0    0.00        0 OLD_PROF_TEST                  1      1
         0    0.00        0 OLD_PROF_TEST                  1     88
         0    0.00        0                                8      9
         0    0.00        0                                2      1
         0    0.00        0                                2      2
         0    0.00        0 OLD_PROF_TEST                  1      3
         0    0.00        0 OLD_PROF_TEST                  1      5
         0    0.00        0 OLD_PROF_TEST                  1      9
         0    0.00        0 OLD_PROF_TEST                  1     20
         0    0.00        1 TABLE_COUNT_TYPE               6      4
         0    0.00        1                                8      2
         0    0.00        2 OLD_PROF_TEST                  1     39
         0    0.00        2 OLD_PROF_TEST                  1     55
         0    0.00        2 OLD_PROF_TEST                  1     69
         0    0.00        2 OLD_PROF_TEST                  1     38
         0    0.00        2 OLD_PROF_TEST                  1     81
         0    0.00        2 OLD_PROF_TEST                  1     42
         0    0.00        2 OLD_PROF_TEST                  1     68

65 rows selected.

Referring to the package, type and anonymous blocks, I assigned labels to all the lines having more than 10µs, as follows:

   MICRO_S SECONDS    CALLS UNIT_NAME            UNIT_NUMBER  LINE#
---------- ------- -------- -------------------- ----------- ------
    729932    0.73        1                                5     13  B2: FETCH
    569563    0.57        2 OLD_PROF_TEST                  1     56  B_Calls_A (Rest_a_While)
    377880    0.38        2 OLD_PROF_TEST                  1     82  DBFunc (Rest_a_While)
    166019    0.17        2 OLD_PROF_TEST                  1     70  R_Calls_R (Rest_a_While)
    150117    0.15        2 OLD_PROF_TEST                  1     43  A_Calls_B (Rest_a_While, section 2)
     72742    0.07        2 OLD_PROF_TEST                  1     40  A_Calls_B (Rest_a_While, section 1)
     56473    0.06        1 TABLE_COUNT_TYPE               6      6  SELECT
      3338    0.00        1                                5      8  B2: SELECT DBFunc
       258    0.00        1                                5     12  B2: OPEN
       109    0.00        1                                5     16  B2: Assign Table_Count_Type
        68    0.00        2 OLD_PROF_TEST                  1     67  Put_Line
        66    0.00        2                                4      1  Auxiliary SERVEROUTPUT block for B2 (surmised)
        60    0.00        2                                7      1  Auxiliary SERVEROUTPUT block for B3 (surmised)
        60    0.00        2                                3      1  Auxiliary SERVEROUTPUT block for B1 (surmised)
        44    0.00        1                                5     14  B2: CLOSE
        42    0.00        0                                2      5  B1: Call to Start_Profiling 
        31    0.00        1 OLD_PROF_TEST                  1     18  RETURN DBMS_Profiler.Stop_Profiler;
        26    0.00        1                                8      5  B3: Call R_Calls_R
        13    0.00        0                                5      1  B2: DECLARE

Notes on Output of Flat Profiler
There were six units with no linked information in DBMS_PROFILER_UNITS. By examining the data, I was able to associate unit numbers 2, 5 and 8 with my anonymous blocks B1, B2 and B3. That left three unassigned, and I have surmised that these correspond to the auxiliary blocks associated with processing server output that we earlier surmised when examining the output from the hierarchical profiler.

  • The useful call tree structure is not present in the data from the old profiler
  • However, the results are at a line level, which the hierarchical profiler does not provide; for example, the two sections of A_Calls_B are reported separately
  • Deciphering the output requires significantly more manual effort than with the hierarchical profiler
  • Both old and new profiler have their own advantages, and so both should be considered of value
  • Manual code timing offers more flexibility in terms of aggregating lines and call instances, but requires more effort...
  • ...but not as much as I thought. As noted later on the second example, after reading another article on the profiler, I realised that I could join the system table ALL_SOURCE to see the text of the line (where available)

Second example: Flat profiler omits some detail timings
After posting the first draft of this article, which was about the newer hierarchical profiler only, I noticed a new post on an old AskTom thread on the older flat profiler. The post concerned a discrepancy between reported times at the aggregate level and detail levels. I suggested using the hierarchical profiler might resolve the problem Try the hierarchical profiler..., and then added sections on the old profiler and on manual timing to this article for comparison. However, my example programs above do not include the AskTom scenario, so I later decided to add a new small scenario to illustrate it and now report the results here. The new test code consists of a PL/SQL block with two calls to DBMS_Lock.Sleep, for 3 and 6 seconds, with the profiling code around them. The driving scripts and output files are included in the zip file attached, and I list summary results below:

I later came upon another artilce on the flat profiler, Profiling PL/SQL with dbms_profiler where the author has joined the system table ALL_SOURCE to get the text of the line profiled, which makes interpretation easier. I have then updated the line-level query as follows:

PROMPT Profiler data by time (PLSQL_PROFILER_DATA)
SELECT Round (dat.total_time/1000, 0)  micro_s,
       Round (dat.total_time/1000000000, 2) seconds,
       dat.total_occur calls,
       unt.unit_name,
       dat.unit_number,
       dat.line#,
       Trim (src.text) text
  FROM plsql_profiler_data dat
  LEFT JOIN plsql_profiler_units unt
    ON unt.runid            = dat.runid
   AND unt.unit_number      = dat.unit_number
  LEFT JOIN all_source      src
    ON src.type             IN ('PACKAGE BODY','FUNCTION','PROCEDURE','TRIGGER')
   AND src.name             = unt.unit_name 
   AND src.line             = dat.line# 
   AND src.owner            = unt.unit_owner 
   AND src.type             = unt.unit_type
 WHERE dat.runid            = :runid
   AND dat.total_time       > 0
 ORDER BY 1 DESC, 2, 3

Of course the text is only available for stored source, so excludes lines from anonymous blocks.

Flat Profiler

Run header (PLSQL_PROFILER_RUNS)

     RUNID RUN_DATE        MICRO_S    SECONDS
---------- ------------ ---------- ----------
         5 20:34:45        9220000       9.22

Profiler data by unit (PLSQL_PROFILER_DATA)

UNIT_NAME            UNIT_NUMBER    MICRO_S SECONDS    CALLS
-------------------- ----------- ---------- ------- --------
                               2        200    0.00        3
UTILS                          1         30    0.00        3

Profiler data by time (PLSQL_PROFILER_DATA)

   MICRO_S SECONDS    CALLS UNIT_NAME            UNIT_NUMBER  LINE# TEXT
---------- ------- -------- -------------------- ----------- ------ ------------------------------------------------
       136    0.00        1                     2      7
        21    0.00        1                     2     10
        19    0.00        1                     2     14
        14    0.00        0 UTILS                          1    343 FUNCTION Stop_D_Profiling RETURN PLS_INTEGER IS
        13    0.00        1 UTILS                          1    346 RETURN DBMS_Profiler.Stop_Profiler;
         6    0.00        1 UTILS                          1    341 END Start_D_Profiling;
         2    0.00        1 UTILS                          1    339 RETURN l_run_number;

7 rows selected.

Hierarchical Profiler

Run header (DBMSHP_RUNS)

     RUNID RUN_TIMESTAMP                   MICRO_S    SECONDS RUN_COMMENT
---------- ---------------------------- ---------- ---------- -----------------------------
        16 19-MAR-13 21.37.00.571000       9000292          9 Profile for DBMS_Lock.Sleep

Functions called (DBMSHP_FUNCTION_INFO)

OWNER      MODULE               FUNCTION                LINE#      SUB_T      FUN_T  CALLS
---------- -------------------- ---------------------- ------ ---------- ---------- ------
BRENDAN    UTILS                1: STOP_H_PROFILING       322          8          8      1
SYS        DBMS_HPROF           2: STOP_PROFILING          59          0          0      1
SYS        DBMS_LOCK            3: SLEEP                  197    9000279    9000279      2
SYS        DBMS_LOCK            4: __pkg_init               0          5          5      1

Function call parent-child links (DBMSHP_PARENT_CHILD_INFO)

OWNER_P MODULE_P  FUNCTION_P            OWNER_C MODULE_C    FUNCTION_C         SUB_T FUN_T  CALLS
------- --------- --------------------- ------- ----------- ------------------ ----- ----- ------
BRENDAN UTILS     1: STOP_H_PROFILING   SYS     DBMS_HPROF  2: STOP_PROFILING      0     0      1

BPF Recursive Subquery Factor Tree Query

Function tree                              SUB_T      FUN_T  CALLS  Row
------------------------------------- ---------- ---------- ------ ----
3: SYS.DBMS_LOCK.SLEEP                   9000279    9000279      2    1
1: BRENDAN.UTILS.STOP_H_PROFILING              8          8      1    2
  2: SYS.DBMS_HPROF.STOP_PROFILING             0          0      1    3
4: SYS.DBMS_LOCK.__pkg_init                    5          5      1    4

Manual Profiler

Timer Set: Profiling DBMS_Lock.Sleep, Constructed at 19 Mar 2013 21:38:54, written at 21:39:03
==============================================================================================
[Timer timed: Elapsed (per call): 0.05 (0.000045), CPU (per call): 0.04 (0.000040), calls: 1000, '***' denotes corrected line below]

Timer               Elapsed          CPU          Calls        Ela/Call        CPU/Call
--------------   ----------   ----------   ------------   -------------   -------------
3 second sleep         3.00         0.00              1         3.00000         0.00000
6 second sleep         6.00         0.00              1         6.00100         0.00000
(Other)                0.00         0.00              1         0.00000         0.00000
--------------   ----------   ----------   ------------   -------------   -------------
Total                  9.00         0.00              3         3.00033         0.00000
--------------   ----------   ----------   ------------   -------------   -------------

Notes on Results for Second Example

  • The flat profiler shows 9s at header level but only 230µs at detail level because DBMS_Lock.Sleep does not permit profiling by the user running the script
  • The hierarchical profiler shows 9s at header level and a total of 9s in 2 calls to DBMS_Lock.Sleep
  • Manual profiling shows the two calls to DBMS_Lock.Sleep taking 3 and 6 seconds

Conclusions

  • Running Oracle's hierarchical profiler would seem to be the default first step in tuning PL/SQL programs from v11.1
  • Some care is needed in interpreting the output data; I've provided a query for displaying the hierarchies
  • Performance is recorded only down to function level, so it will still often be worthwhile to use the old flat profiler in addition
  • Manually timing code sections also still has a part to play, in terms of instrumentation and greater flexibility where necessary

Brendan HProf Code
Example 2






An SQL Solution for the Multiple Knapsack Problem (SKP-m)

In my last article, A Simple SQL Solution for the Knapsack Problem (SKP-1), I presented an SQL solution for the well known knapsack problem in its simpler 1-knapsack form (and it is advisable to read the first article before this one). Here I present an SQL solution for the problem in its more difficult multiple-knapsack form. The solution is a modified version of one I posted on OTN, SQL Query for mapping a set of batches to a class rooms group, and I describe two versions of it, one in pure SQL, and another that includes a database function. The earlier article provided the solutions as comma-separated strings of item identifiers, and in this article also the solutions are first obtained as delimited strings. However, as there are now containers as well as items, we extend the SQL to provide solutions with item and container names in separate fields within records for each container-item pair. The solution was presented here initially, as before, more for its theoretical interest than for practical applicability. Much research has been done on procedural algorithms for this important, but computationally difficult class of problems.

Update, 26 November 2017: My GitHub repo: Brendan's repo for interesting SQL has simple installation and query scripts for this problem. I should also note that after this post I went on to use similar techniques on other combinatorial problems, such as SQL for the Fantasy Football Knapsack Problem; I extended the idea there to allow for fast approximate solutions making it viable for larger problems, and have also used a similar idea here, SQL for the Travelling Salesman Problem (and in other articles).

We will consider the same simple example problem as in the earlier article, having four items, but now with two containers with individual weight limits of 8 and 10. As noted in the earlier article, the problem can be considered as that of assigning each item to one of the containers, or to none, leading directly to the expression for the number of not necessarily feasible assignment sets for the example. We can again depict the 24 possible item combinations in a diagram, with the container limits added.

Multi, v1.1 - CombisWe can see that there is one optimal solution in this case, in which items 1 and 3 are assigned to container 1, while items 2 and 4 are assigned to container 2, with a profit of 100. How to find it using SQL?

SQL Solution
The solution to the single knapsack problem worked by joining items recursively in increasing order of item id, accumulating the total weights and profits, and terminating a sequence when no more items can be added within the weight limit. The item sequences were accumulated as comma-separated strings, and the optimal solutions obtained by analytic ranking of the profits.

For the multiple knapsack problem, it's not quite as simple, but a similar approach may be a good starting point. Previously our anchor branch in the recursion selected all items below the single maximum weight, but we now have containers with individual weights. If we now join the containers table we can find all items falling within the maximum weights by container. The recursion can then proceed to find all feasible item combinations by container. Here is the SQL for this:

WITH rsf_itm (con_id, max_weight, itm_id, lev, tot_weight, tot_profit, path) AS (
SELECT c.id, 
       c.max_weight,
       i.id,
       0,
       i.item_weight,
       i.item_profit, 
       ',' || i.id || ','
  FROM items i
  JOIN containers c
    ON i.item_weight <= c.max_weight
 UNION ALL
SELECT r.con_id,
       r.max_weight,
       i.id, 
       r.lev + 1, 
       r.tot_weight + i.item_weight,
       r.tot_profit + i.item_profit,
       r.path || i.id || ','
  FROM rsf_itm r
  JOIN items i
    ON i.id > r.itm_id
   AND r.tot_weight + i.item_weight <= r.max_weight
 ORDER BY 1, 2
) SEARCH DEPTH FIRST BY con_id, itm_id SET line_no
SELECT con_id,
       max_weight,
       LPad (To_Char(itm_id), 2*lev + 1, ' ') itm_id, 
       path itm_path,
       tot_weight, tot_profit
  FROM rsf_itm
 ORDER BY line_no

and here is the resulting output:

CON_ID MAX_WEIGHT ITM_ID ITM_PATH    TOT_WEIGHT TOT_PROFIT
------ ---------- ------ ----------- ---------- ----------
     1          8 1      ,1,                  3         10
                    2    ,1,2,                7         30
                    3    ,1,3,                8         40
                  2      ,2,                  4         20
                  3      ,3,                  5         30
                  4      ,4,                  6         40
     2         10 1      ,1,                  3         10
                    2    ,1,2,                7         30
                    3    ,1,3,                8         40
                    4    ,1,4,                9         50
                  2      ,2,                  4         20
                    3    ,2,3,                9         50
                    4    ,2,4,               10         60
                  3      ,3,                  5         30
                  4      ,4,                  6         40

15 rows selected.

Looking at this, we can see that the overall solution will comprise one feasible combination of items for each container, with the constraint that no item appears in more than one container. This suggests that we could perform a second recursion in a similar way to the first, but this time using the results of the first as input, and joining the feasible combinations of containers of higher id only. If we again accumulate the sequence in a delimited string, regular expression functionality could be used to avoid joining combinations with items already included. The following SQL does this recursion:

WITH rsf_itm (con_id, max_weight, itm_id, tot_weight, tot_profit, path) AS (
SELECT c.id, 
       c.max_weight,
       i.id, 
       i.item_weight,
       i.item_profit, 
       ',' || i.id || ','
  FROM items i
  JOIN containers c
    ON i.item_weight <= c.max_weight
 UNION ALL
SELECT r.con_id,
       r.max_weight,
       i.id, 
       r.tot_weight + i.item_weight,
       r.tot_profit + i.item_profit,
       r.path || i.id || ','
  FROM rsf_itm r
  JOIN items i
    ON i.id > r.itm_id
   AND r.tot_weight + i.item_weight <= r.max_weight
)
, rsf_con (con_id, con_itm_set, con_itm_path, lev, tot_weight, tot_profit) AS (
SELECT con_id,
       ':' || con_id || ':' || path,
       ':' || con_id || ':' || path,
       0,
       tot_weight,
       tot_profit
  FROM rsf_itm
 UNION ALL
SELECT r_i.con_id,
       ':' || r_i.con_id || ':' || r_i.path,
       r_c.con_itm_path ||  ':' || r_i.con_id || ':' || r_i.path,
       r_c.lev + 1, 
       r_c.tot_weight + r_i.tot_weight,
       r_c.tot_profit + r_i.tot_profit
  FROM rsf_con r_c
  JOIN rsf_itm r_i
    ON r_i.con_id > r_c.con_id
 WHERE RegExp_Instr (r_c.con_itm_path || r_i.path, ',(\d+),.*?,\1,') = 0
) SEARCH DEPTH FIRST BY con_id SET line_no
SELECT
       LPad (' ', 2*lev, ' ') || con_itm_set con_itm_set,
       con_itm_path,
       tot_weight, tot_profit
  FROM rsf_con
 ORDER BY line_no

Notice the use of RegExp_Instr, which takes the current sequence with potential new combination appended as its source string, and looks for a match against the search string ',(\d+),.*?,\1,'. The function returns 0 if no match is found, meaning no duplicate item was found. The sequence includes the container id using a different delimiter, a colon, at the start of each combination. The search string can be explained as follows:

,(\d+), = a sequence of one or more digits with a comma either side, and the digit sequence saved for referencing
.*?,\1, = a sequence of any characters, followed by the saved digit sequence within commas. The ? specifies a non-greedy search, meaning stop searching as soon as a match is found

The result of the query is:

CON_ITM_SET          CON_ITM_PATH         TOT_WEIGHT TOT_PROFIT
-------------------- -------------------- ---------- ----------
:1:,1,               :1:,1,                        3         10
  :2:,2,             :1:,1,:2:,2,                  7         30
  :2:,3,             :1:,1,:2:,3,                  8         40
  :2:,4,             :1:,1,:2:,4,                  9         50
  :2:,2,3,           :1:,1,:2:,2,3,               12         60
  :2:,2,4,           :1:,1,:2:,2,4,               13         70
:1:,2,               :1:,2,                        4         20
  :2:,1,             :1:,2,:2:,1,                  7         30
  :2:,3,             :1:,2,:2:,3,                  9         50
  :2:,1,3,           :1:,2,:2:,1,3,               12         60
  :2:,4,             :1:,2,:2:,4,                 10         60
  :2:,1,4,           :1:,2,:2:,1,4,               13         70
:1:,1,2,             :1:,1,2,                      7         30
  :2:,3,             :1:,1,2,:2:,3,               12         60
  :2:,4,             :1:,1,2,:2:,4,               13         70
:1:,3,               :1:,3,                        5         30
  :2:,1,             :1:,3,:2:,1,                  8         40
  :2:,2,             :1:,3,:2:,2,                  9         50
  :2:,1,2,           :1:,3,:2:,1,2,               12         60
  :2:,4,             :1:,3,:2:,4,                 11         70
  :2:,1,4,           :1:,3,:2:,1,4,               14         80
  :2:,2,4,           :1:,3,:2:,2,4,               15         90
:1:,1,3,             :1:,1,3,                      8         40
  :2:,2,             :1:,1,3,:2:,2,               12         60
  :2:,4,             :1:,1,3,:2:,4,               14         80
  :2:,2,4,           :1:,1,3,:2:,2,4,             18        100
:1:,4,               :1:,4,                        6         40
  :2:,1,             :1:,4,:2:,1,                  9         50
  :2:,2,             :1:,4,:2:,2,                 10         60
  :2:,1,2,           :1:,4,:2:,1,2,               13         70
  :2:,3,             :1:,4,:2:,3,                 11         70
  :2:,1,3,           :1:,4,:2:,1,3,               14         80
  :2:,2,3,           :1:,4,:2:,2,3,               15         90
:2:,1,               :2:,1,                        3         10
:2:,2,               :2:,2,                        4         20
:2:,1,2,             :2:,1,2,                      7         30
:2:,3,               :2:,3,                        5         30
:2:,1,3,             :2:,1,3,                      8         40
:2:,4,               :2:,4,                        6         40
:2:,1,4,             :2:,1,4,                      9         50
:2:,2,3,             :2:,2,3,                      9         50
:2:,2,4,             :2:,2,4,                     10         60

42 rows selected.

We can see that the optimal solutions can be obtained from the output again using analytic ranking by profit, and in this case the solution with a profit of 100 is the optimal one, with sequence ':1:,1,3,:2:,2,4,'. In the full solution, as well as selecting out the top-ranking solutions, we have extended the query to output the items and containers by name, in distinct fields with a record for every solution/container/item combination. For the example problem above, the output is:

    SOL_ID S_WT  S_PR  C_ID C_NAME          M_WT C_WT  I_ID I_NAME     I_WT I_PR
---------- ---- ----- ----- --------------- ---- ---- ----- ---------- ---- ----
         1   18   100     1 Item 1             8    8     1 Item 1        3   10
                                                          3 Item 3        5   30
                          2 Item 2            10   10     2 Item 2        4   20
                                                          4 Item 4        6   40

SQL-Only Solution - XSQL
There are various techniques in SQL for splitting string columns into multiple rows and columns. We will take one of the more straightforward ones that uses the DUAL table with CONNECT BY to generate rows against which to anchor the string-parsing.

WITH rsf_itm (con_id, max_weight, itm_id, lev, tot_weight, tot_profit, path) AS (
SELECT c.id, 
       c.max_weight,
       i.id, 
       0, 
       i.item_weight,
       i.item_profit, 
       ',' || i.id || ','
  FROM items i
  JOIN containers c
    ON i.item_weight <= c.max_weight
 UNION ALL
SELECT r.con_id,
       r.max_weight,
       i.id, 
       r.lev + 1, 
       r.tot_weight + i.item_weight,
       r.tot_profit + i.item_profit,
       r.path || i.id || ','
  FROM rsf_itm r
  JOIN items i
    ON i.id > r.itm_id
   AND r.tot_weight + i.item_weight <= r.max_weight
)
, rsf_con (con_id, con_path, itm_path, tot_weight, tot_profit, lev) AS (
SELECT con_id,
       To_Char(con_id),
       ':' || con_id || '-' || (lev + 1) || ':' || path,
       tot_weight,
       tot_profit,
       0
  FROM rsf_itm
 UNION ALL
SELECT r_i.con_id,
       r_c.con_path || ',' || r_i.con_id,
       r_c.itm_path ||  ':' || r_i.con_id || '-' || (r_i.lev + 1) || ':' || r_i.path,
       r_c.tot_weight + r_i.tot_weight,
       r_c.tot_profit + r_i.tot_profit,
       r_c.lev + 1
  FROM rsf_con r_c
  JOIN rsf_itm r_i
    ON r_i.con_id > r_c.con_id
   AND RegExp_Instr (r_c.itm_path || r_i.path, ',(\d+),.*?,\1,') = 0
)
, paths_ranked AS (
SELECT itm_path || ':' itm_path, tot_weight, tot_profit, lev + 1 n_cons,
       Rank () OVER (ORDER BY tot_profit DESC) rnk
  FROM rsf_con
), best_paths AS (
SELECT itm_path, tot_weight, tot_profit, n_cons,
       Row_Number () OVER (ORDER BY tot_weight DESC) sol_id
  FROM paths_ranked
 WHERE rnk = 1
), row_gen AS (
SELECT LEVEL lev
  FROM DUAL
CONNECT BY LEVEL <= (SELECT Count(*) FROM items)
), con_v AS (
SELECT  r.lev con_ind, b.sol_id, b.tot_weight, b.tot_profit,
        Substr (b.itm_path, Instr (b.itm_path, ':', 1, 2*r.lev - 1) + 1, 
                            Instr (b.itm_path, ':', 1, 2*r.lev) - Instr (b.itm_path, ':', 1, 2*r.lev - 1) - 1)
           con_nit_id,
        Substr (b.itm_path, Instr (b.itm_path, ':', 1, 2*r.lev) + 1, 
                            Instr (b.itm_path, ':', 1, 2*r.lev + 1) - Instr (b.itm_path, ':', 1, 2*r.lev) - 1)
           itm_str
  FROM best_paths b
  JOIN row_gen r
    ON r.lev <= b.n_cons
), con_split AS (
SELECT sol_id, tot_weight, tot_profit,
       Substr (con_nit_id, 1, Instr (con_nit_id, '-', 1) - 1) con_id,
       Substr (con_nit_id, Instr (con_nit_id, '-', 1) + 1) n_items,
       itm_str
  FROM con_v
), itm_v AS (
SELECT  c.sol_id, c.con_id, c.tot_weight, c.tot_profit,
        Substr (c.itm_str, Instr (c.itm_str, ',', 1, r.lev) + 1, 
                            Instr (c.itm_str, ',', 1, r.lev + 1) - Instr (c.itm_str, ',', 1, r.lev) - 1)
           itm_id
  FROM con_split c
  JOIN row_gen r
    ON r.lev <= c.n_items
)
SELECT v.sol_id sol_id,
       v.tot_weight s_wt, 
       v.tot_profit s_pr, 
       c.id c_id, 
       c.name c_name, 
       c.max_weight m_wt,
       Sum (i.item_weight) OVER (PARTITION BY v.sol_id, c.id) c_wt,
       i.id i_id, 
       i.name i_name, 
       i.item_weight i_wt, 
       i.item_profit i_pr
  FROM itm_v v
  JOIN containers c
    ON c.id = To_Number (v.con_id)
  JOIN items i
    ON i.id = To_Number (v.itm_id)
 ORDER BY sol_id, con_id, itm_id

SQL with Function Solution - XFUN
The SQL techniques for string-splitting are quite cumbersome, and a better approach may be the use of a pipelined function that allows the string-parsing to be done in PL/SQL, a procedural language that is better suited to the task.

WITH rsf_itm (con_id, max_weight, itm_id, tot_weight, tot_profit, path) AS (
SELECT c.id, 
       c.max_weight,
       i.id, 
       i.item_weight,
       i.item_profit, 
       ',' || i.id || ','
  FROM items i
  JOIN containers c
    ON i.item_weight <= c.max_weight
 UNION ALL
SELECT r.con_id,
       r.max_weight,
       i.id, 
       r.tot_weight + i.item_weight,
       r.tot_profit + i.item_profit,
       r.path || i.id || ','
  FROM rsf_itm r
  JOIN items i
    ON i.id > r.itm_id
   AND r.tot_weight + i.item_weight <= r.max_weight
 ORDER BY 1, 2
)
, rsf_con (con_id, itm_path, tot_weight, tot_profit) AS (
SELECT con_id,
       ':' || con_id || ':' || path,
       tot_weight,
       tot_profit
  FROM rsf_itm
 UNION ALL
SELECT r_i.con_id,
       r_c.itm_path ||  ':' || r_i.con_id || ':' || r_i.path,
       r_c.tot_weight + r_i.tot_weight,
       r_c.tot_profit + r_i.tot_profit
  FROM rsf_con r_c
  JOIN rsf_itm r_i
    ON r_i.con_id > r_c.con_id
   AND RegExp_Instr (r_c.itm_path || r_i.path, ',(\d+),.*?,\1,') = 0
)
, paths_ranked AS (
SELECT itm_path || ':' itm_path, tot_weight, tot_profit, Rank () OVER (ORDER BY tot_profit DESC) rn,
       Row_Number () OVER (ORDER BY tot_profit DESC, tot_weight DESC) sol_id
  FROM rsf_con
), itm_v AS (
SELECT s.con_id, s.itm_id, p.itm_path, p.tot_weight, p.tot_profit, p.sol_id
  FROM paths_ranked p
 CROSS JOIN TABLE (Multi.Split_String (p.itm_path)) s
 WHERE rn = 1
)
SELECT v.sol_id sol_id,
       v.tot_weight s_wt, 
       v.tot_profit s_pr, 
       c.id c_id, 
       c.name c_name, 
       c.max_weight m_wt,
       Sum (i.item_weight) OVER (PARTITION BY v.sol_id, c.id) c_wt,
       i.id i_id, 
       i.name i_name, 
       i.item_weight i_wt, 
       i.item_profit i_pr
  FROM itm_v v
  JOIN containers c
    ON c.id = To_Number (v.con_id)
  JOIN items i
    ON i.id = To_Number (v.itm_id)
 ORDER BY sol_id, con_id, itm_id

Pipelined Database Function

CREATE OR REPLACE TYPE con_itm_type AS OBJECT (con_id NUMBER, itm_id NUMBER);
/
CREATE OR REPLACE TYPE con_itm_list_type AS VARRAY(100) OF con_itm_type;
/
CREATE OR REPLACE PACKAGE BODY Multi IS

FUNCTION Split_String (p_string VARCHAR2) RETURN con_itm_list_type PIPELINED IS

  l_pos_colon_1           PLS_INTEGER := 1;
  l_pos_colon_2           PLS_INTEGER;
  l_pos_comma_1           PLS_INTEGER;
  l_pos_comma_2           PLS_INTEGER;
  l_con                   PLS_INTEGER;
  l_itm                   PLS_INTEGER;

BEGIN

  LOOP

    l_pos_colon_2 := Instr (p_string, ':', l_pos_colon_1 + 1, 1);
    EXIT WHEN l_pos_colon_2 = 0;

    l_con := To_Number (Substr (p_string, l_pos_colon_1 + 1, l_pos_colon_2 - l_pos_colon_1 - 1));
    l_pos_colon_1 := Instr (p_string, ':', l_pos_colon_2 + 1, 1);
    l_pos_comma_1 := l_pos_colon_2 + 1;

    LOOP

      l_pos_comma_2 := Instr (p_string, ',', l_pos_comma_1 + 1, 1);
      EXIT WHEN l_pos_comma_2 = 0 OR l_pos_comma_2 > l_pos_colon_1;

      l_itm := To_Number (Substr (p_string, l_pos_comma_1 + 1, l_pos_comma_2 - l_pos_comma_1 - 1));
      PIPE ROW (con_itm_type (l_con, l_itm));
      l_pos_comma_1 := l_pos_comma_2;
 
    END LOOP;

  END LOOP;

END Split_String;

END Multi;

Query Structure Diagram (embedded directly)
The QSD shows both queries in a single diagram as the early query blocks are almost the same (the main difference is that the strings contain a bit more information for XSQL to facilitate the later splitting). The directly-embedded version shows the whole query, but it may be hard to read the detail, so it is followed by a larger, scrollable version within Excel.
QSD shwoing both versions of SQL

Query Structure Diagram (embedded via Excel)
This is the larger, scrollable version.

Performance Analysis
As in the previous article, we will see how the solution methods perform as problem size varies, using my own performance benchmarking framework.

Test Data Sets
Test data sets are generated as follows, in terms of two integer parameters, w and d:

  • Insert w containers with sequential ids and random maximum weights between 1 and 100
  • Insert d items with sequential ids and random weights and profits in the ranges 1-60 and 1-10000, respectively, via Oracle's function DBMS_Random.Value

Test Results
The embedded Excel file below summarises the results obtained over a grid of data points, with w in (1, 2, 3) and d in (8, 10, 12, 14, 16, 18).

The graphs tab below shows 3-d graphs of the number of rows processed and the CPU time for XFUN.

Notes

  • There is not much difference in performance between the two query versions, no doubt because the number of solution records is generally small compared with rows processed in the recursions
  • Notice that the timings correlate well with the rows processed, but not so well with the numbers of base records. The nature of the problem means that some of the randomised data sets turn out to be much harder to solve than others
  • Notice the estimated rows on step 36 of the execution plan for the pipelined function solution. The value of 8168 is a fixed value that Oracle assumes since it has no statistics to go on. We could improve this by using the (undocumented) cardinality hint to provide a smaller estimate
  • I extended my benchmarking framework for this article to report the intermediate numbers of rows processed, as well as the cardinality estimates and derived errors in these estimates (maximum for each plan). It is obvious from the nature of the problem that Oracle's Cost Based Optimiser (CBO) is not going to be able to make good cardinality estimates

Conclusions
Oracle's v11.2 implementation of the Ansii SQL feature recursive subquery factoring provides a means for solving the knapsack problem, in its multiple knapsack form, in SQL. The solution is not practical for large problems, for which procedural techniques that have been extensively researched should be considered. However, the techniques used may be of interest for combinatorial problems that are small enough to be handled in SQL, and for other types of problem in general.






A Simple SQL Solution for the Knapsack Problem (SKP-1)

A poster on OTN (Combination using pl/sql) recently asked for an SQL solution to a problem that turned out to be an example of the well known Knapsack Problem, for the case of a single knapsack. I posted an SQL query as a solution, and also a solution in PL/SQL because the SQL solution uses a feature only available in Oracle v11.2. In this article I explain how the solutions work and provide the results of a performance analysis that involved randomised test problems of varying computational difficulty. I have taken a more general form of problem than the original poster described, and the solutions here have been improved.

Update, 14 July 2013: I used the technique in response to another OTN post here, SQL for the Fantasy Football Knapsack Problem. I have extended the idea there to allow for fast approximate solutions making it viable for larger problems, and have also used a similar idea here, SQL for the Travelling Salesman Problem (and in other articles).

Update, 26 November 2017: My GitHub repo: Brendan's repo for interesting SQL has simple installation and query scripts for this problem.

Knapsack Problem (1-Knapsack)
The various forms of knapsack problem have been studied extensively. The problems are known to be computationally difficult and many algorithms have been proposed for both exact and approximate solutions (see reference above). The SQL solution in this article is quite simple and will not be competitive in performance for larger problems in the form described here, but may be interesting for being implemented in pure SQL (and without using Oracle's Model clause, or a purely brute force approach). However, I have later extended the approach to allow for search limiting and have shown this to be viable for larger problems (see links at the top).

The problem can be stated informally, as follows: Given a set of items, each having positive weight and profit attributes, and a weight limit, find the combinations of items that maximise profit within the weight limit. Variant versions include the addition of multiple constraints (easy to handle), and inclusion of multiple knapsacks (more difficult). I also have a solution for the multiple knapsacks version described here (An SQL Solution for the Multiple Knapsack Problem (SKP-m)).

The difficulty of the problem arises from the number of possible combinations increasing exponentially with problem size. The number of these (not necessarily feasible) combinations, N(n,1), can be expressed in terms of the number of items, n, in two ways. First, we can use the well known binomial expression for the number of combinations of r items, summed from r=0 to r=n:

Second, and more simply, we can observe that including an item in the combination, or not, is a binary choice, leading to:


This generalises easily to the expression for the multiple knapsack problem, with m knapsacks:


This can also be expressed using a binomial series as


Here, represents the number of combinations of r items from n, with being the number of assignments of the r items to m containers.

Let's look at a simple example problem having four items, with a weight limit of 9, as shown below:

Items

There are 16 possible combinations of these items, having from 0 to 4 items. These are depicted below:

Combinations

We can see that there are two optimal solutions in this case. How to find them using SQL?

SQL Solution

Oracle's v11.2 implementation of the Ansii standard Recursive Subquery Factoring can be used as the basis for an SQL solution. This would works as follows: Starting from each item in turn, add items recursively while remaining within the weight limit, and considering only items of id greater than the current id. The SQL looks like this, where a marker is added for leaf nodes, following an approach from the Amis technology blog:

WITH rsf (nxt_id, lev, tot_weight, tot_profit, path) AS (
SELECT id nxt_id, 0 lev, item_weight tot_weight, item_profit tot_profit, To_Char (id) path
  FROM items
 UNION ALL
SELECT n.id, 
       r.lev + 1, 
       r.tot_weight + n.item_weight,
       r.tot_profit + n.item_profit,
       r.path || ',' || To_Char (n.id)
  FROM rsf r
  JOIN items n
    ON n.id > r.nxt_id
   AND r.tot_weight + n.item_weight <= 9
) SEARCH DEPTH FIRST BY nxt_id SET line_no 
SELECT LPad (To_Char(nxt_id), lev + 1, '*') node,tot_weight, tot_profit,
       CASE WHEN lev >= Lead (lev, 1, lev) OVER (ORDER BY line_no) THEN 'Y' END is_leaf,
       path
  FROM rsf
 ORDER BY line_no

and the solution like this:

NODE       TOT_WEIGHT TOT_PROFIT I PATH
---------- ---------- ---------- - ------------------------------
1                   3         10   1
*2                  7         30 Y 1,2
*3                  8         40 Y 1,3
*4                  9         50 Y 1,4
2                   4         20   2
*3                  9         50 Y 2,3
3                   5         30 Y 3
4                   6         40 Y 4

8 rows selected.

The output contains 8 records, as opposed to the total of 15 non-null combinations, because only feasible items are joined, and permutations are avoided by the constraint that item ids increase along the path. Given positivity of weight and profit, we know that all solutions must be leaves, and we can represent the tree structure above in the following diagram:
Leaves
We can now use the recursive subquery factor as an input to a main query that selects one of the most profitable solutions, or alternatively to a further subquery factor that ranks the solutions in order of descending profit. In the latter case, the main query can select all the most profitable solutions.

In the solution I posted on the OTN thread, I included a subquery factor to restrict the final query section to leaf nodes only. This was because we know that the solutions must be leaf nodes, and usually it is more efficient to filter out non-solution records as early as possible. However, I later realised that the work involved in the filtering might outweigh the saving for the final section, and this turned out to be the case here, as shown in the performance analysis section below. Here are the two queries, without the leaf node filtering:

Query - KEEP

WITH rsf (id, lev, tot_weight, tot_profit, path) AS (
SELECT id, 0, item_weight, item_profit, To_Char (id)
  FROM items
 UNION ALL
SELECT n.id, 
       r.lev + 1, 
       r.tot_weight + n.item_weight,
       r.tot_profit + n.item_profit,
       r.path || ',' || To_Char (n.id)
  FROM rsf r
  JOIN items n
    ON n.id > r.id
   AND r.tot_weight + n.item_weight <= 100
)
SELECT Max (tot_weight) KEEP (DENSE_RANK LAST ORDER BY tot_profit) tot_weight,
       Max (tot_profit) KEEP (DENSE_RANK LAST ORDER BY tot_profit) tot_profit,
       Max (path) KEEP (DENSE_RANK LAST ORDER BY tot_profit) path,
       (Max (lev) KEEP (DENSE_RANK LAST ORDER BY tot_profit) + 1) n_items
  FROM rsf

Query - RANK

WITH rsf (id, lev, tot_weight, tot_profit, path) AS (
SELECT id, 0, item_weight, item_profit, To_Char (id)
  FROM items
 UNION ALL
SELECT n.id, 
       r.lev + 1, 
       r.tot_weight + n.item_weight,
       r.tot_profit + n.item_profit,
       r.path || ',' || To_Char (n.id)
  FROM rsf r
  JOIN items n
    ON n.id > r.id
   AND r.tot_weight + n.item_weight <= 100
)
, paths_ranked AS (
SELECT tot_weight, tot_profit, path, 
       Dense_Rank () OVER (ORDER BY tot_profit DESC) rnk_profit,
       lev
  FROM rsf
)
SELECT tot_weight tot_weight, 
       tot_profit tot_profit, 
       path path, 
       (lev + 1) n_items
  FROM paths_ranked
 WHERE rnk_profit = 1
 ORDER BY tot_weight DESC

Query Structure Diagram
QSD
It's worth noting that Oracle's proprietary recursive syntax, Connect By, cannot be used in this way because of the need to accumulate weights forward through the recursion. The new Ansii syntax is only available from v11.2 though, and I thought it might be interesting to implement a solution in PL/SQL that would work in earlier versions, following a similar algorithm, again with recursion.

PL/SQL Recursive Solution

This is a version in the form of a pipelined function, as I wanted to compare it with the SQL solutions, and be callable from SQL.
SQL

SELECT COLUMN_VALUE sol
  FROM TABLE (Packing_PLF.Best_Fits (100))
 ORDER BY COLUMN_VALUE

Package

CREATE OR REPLACE PACKAGE BODY Packing_PLF IS

FUNCTION Best_Fits (p_weight_limit NUMBER) RETURN SYS.ODCIVarchar2List PIPELINED IS

  TYPE item_type IS RECORD (
                        item_id                 PLS_INTEGER,
                        item_index_parent       PLS_INTEGER,
                        weight_to_node          NUMBER);
  TYPE item_tree_type IS        TABLE OF item_type;
  g_solution_list               SYS.ODCINumberList;

  g_timer                       PLS_INTEGER := Timer_Set.Construct ('Pipelined Recursion');

  i                             PLS_INTEGER := 0;
  j                             PLS_INTEGER := 0;
  g_item_tree                   item_tree_type;
  g_item                        item_type;
  l_weight                      PLS_INTEGER;
  l_weight_new                  PLS_INTEGER;
  l_best_profit                 PLS_INTEGER := -1;
  l_sol                         VARCHAR2(4000);
  l_sol_cnt                     PLS_INTEGER := 0;

  FUNCTION Add_Node (  p_item_id               PLS_INTEGER,
                       p_item_index_parent     PLS_INTEGER, 
                       p_weight_to_node        NUMBER) RETURN PLS_INTEGER IS
  BEGIN

    g_item.item_id := p_item_id;
    g_item.item_index_parent := p_item_index_parent;
    g_item.weight_to_node := p_weight_to_node;
    IF g_item_tree IS NULL THEN

      g_item_tree := item_tree_type (g_item);

    ELSE

      g_item_tree.Extend;
      g_item_tree (g_item_tree.COUNT) := g_item;

    END IF;
    RETURN g_item_tree.COUNT;

  END Add_Node;

  PROCEDURE Do_One_Level (p_tree_index PLS_INTEGER, p_item_id PLS_INTEGER, p_tot_weight PLS_INTEGER, p_tot_profit PLS_INTEGER) IS

    CURSOR c_nxt IS
    SELECT id, item_weight, item_profit
      FROM items
     WHERE id > p_item_id
       AND item_weight + p_tot_weight <= p_weight_limit;
    l_is_leaf           BOOLEAN := TRUE;
    l_index_list        SYS.ODCINumberList;

  BEGIN

    FOR r_nxt IN c_nxt LOOP
      Timer_Set.Increment_Time (g_timer,  'Do_One_Level/r_nxt');

      l_is_leaf := FALSE;
      Do_One_Level (Add_Node (r_nxt.id, p_tree_index, r_nxt.item_weight + p_tot_weight), r_nxt.id, p_tot_weight + r_nxt.item_weight, p_tot_profit + r_nxt.item_profit);
      Timer_Set.Increment_Time (g_timer,  'Do_One_Level/Do_One_Level');

    END LOOP;

    IF l_is_leaf THEN

      IF p_tot_profit > l_best_profit THEN

        g_solution_list := SYS.ODCINumberList (p_tree_index);
        l_best_profit := p_tot_profit;

      ELSIF p_tot_profit = l_best_profit THEN

        g_solution_list.Extend;
        g_solution_list (g_solution_list.COUNT) := p_tree_index;

      END IF;

    END IF;
    Timer_Set.Increment_Time (g_timer,  'Do_One_Level/leaves');

  END Do_One_Level;

BEGIN

  FOR r_itm IN (SELECT id, item_weight, item_profit FROM items) LOOP

    Timer_Set.Increment_Time (g_timer,  'Root fetches');
    Do_One_Level (Add_Node (r_itm.id, 0, r_itm.item_weight), r_itm.id, r_itm.item_weight, r_itm.item_profit);

  END LOOP;

  FOR i IN 1..g_solution_list.COUNT LOOP

    j := g_solution_list(i);
    l_sol := NULL;
    l_weight := g_item_tree (j).weight_to_node;
    WHILE j != 0 LOOP

      l_sol := l_sol || g_item_tree (j).item_id || ', ';
      j :=  g_item_tree (j).item_index_parent;

    END LOOP;
    l_sol_cnt := l_sol_cnt + 1;
    PIPE ROW ('Solution ' || l_sol_cnt || ' (profit ' || l_best_profit || ', weight ' || l_weight || ') : ' || RTrim (l_sol, ', '));

  END LOOP;

  Timer_Set.Increment_Time (g_timer,  'Write output');
  Timer_Set.Write_Times (g_timer);

EXCEPTION
  WHEN OTHERS THEN
    Timer_Set.Write_Times (g_timer);
    RAISE;

END Best_Fits;

END Packing_PLF;

Performance Analysis

It will be interesting to see how the solution methods perform as problem size varies, and we will use my own performance benchmarking framework to do this. As the framework is designed to compare performance of SQL queries, I have converted the PL/SQL solution to operate as a pipelined function, and thus be callable from SQL, as noted above. I included a version of the SQL solution, with the leaf filtering mentioned above, XKPLV - this was based on XKEEP, with filtering as in the OTN thread.

Test Data Sets

Test data sets are generated as follows, in terms of two integer parameters, w and d:

Insert w items with sequential ids, and random weights and profits in the ranges 0-d and 0-1000, respectively, via Oracle's function DBMS_Random.Value. The maximum weight is fixed at 100.

Test Results

The embedded Excel file below summarises the results obtained over a grid of data points, with w in (12, 14, 16, 18, 20) and d in (16, 18, 20).

Notes

  • The two versions of the non-leaf SQL solution take pretty much the same time to execute, and are faster than the others
  • The leaf version of the SQL solution (XKPLV) is slower than the non-leaf versions, and becomes much worse in terms of elapsed time for the more difficult problems; the step-change in performance can be seen to be due to its greater memory usage, which spills to disk above a certain level
  • The pipelined function solution is significantly slower than the other solutions in terms of CPU time, and elapsed time, except in the case of the leaf SQL solution when that solution's memory usage spills to disk. The pipelined function continues to use more memory as the problem difficulty rises, until all available memory is consumed, when it throws an error (but this case is not included in the result set above)

Conclusions

Oracle's v11.2 implementation of the Ansii SQL feature recursive subquery factoring provides a simple solution for the knapsack problem, that cannot be achieved with Oracle's older Connect By syntax alone.

The method has been described here in its exact form that is viable only for small problems; however, I have later extended the approach to allow for search limiting and have shown this to be viable for larger problems.






Master-Detail Transaction Reconciliation in SQL (MDTM3)

This is the final article in a sequence of three on the subject of master-detail transaction matching. In the first article, Master-Detail Transaction Matching in SQL (MDTM1), I described the problem and divided it into two subproblems, the first being to identify all pairs of matching transactions and the second being to reconcile the pairs so that one transaction matches against at most one other transaction. The underlying motivation here comes from the problem of reconciling intra-company credit and debit transactions where fields may need to match directly, or may need to match after some mapping function is applied, including inversion (contra-matching). We have taken a simple prototype problem defined on Oracle's system tables where only matching conditions are specified. It should be straightforward to extend the techniques demonstrated to more general matching conditions (I have done so myself on the real business problem that prompted this analysis).

The first article developed a series of queries to solve the first subproblem using the idea of pre-aggregation of the detail records as a key performance-enhancing feature. This resulted in a best query that used a temporary table and achieved a time variation that was quadratic in the number of master transactions (we kept the numbers of details per master fixed).

The second article, Holographic Set Matching in SQL (MDTM2), took the aggregation a step further, using list aggregation to bypass direct detail set matching altogether, and this enabled linear time variation.

In this third article, we take the last, linear-time query and extend it to solve the second subproblem, providing a solution to the overall problem in a single query that shows the same linear-time variation property in our test results. The sample problem will be the same as in the previous article.

Output Specification
The output will be ordered first by section, then by group, then by transaction unique identifier, with paired records appearing together using the first transaction for ordering within the group. The sections are defined thus:

  • Reconciled pairs - two records for each matching pair, with no transaction appearing in more than one pair
  • Matched but unreconciled transactions - transactions that match others but could not be reconciled because their matching partners are all paired off against other transactions
  • Unmatched transactions - transactions not matching any other transaction

Queries
We'll include the best query from the last article (L2_SQF), as well as the new query (RECON) that extends it to solve the overall problem.

  • L2_SQF - solves first subproblem by list aggregation without direct detil matching
  • RECON - extends L2_SQF to solve the overall problem using a sequence of query subfactors

The first query will not be described below, as it appeared in the previous article but will be included in the results section for comparison purposes.

Query Structure Diagram (QSD)

Query Text

WITH rns AS (
SELECT r_owner,
       r_constraint_name,
       Row_Number () OVER (ORDER BY r_owner, r_constraint_name) - 1 rn
  FROM con_cp
 WHERE constraint_type	    = 'R'
 GROUP BY 
       r_owner,
       r_constraint_name
), rch AS ( 
SELECT r_owner,
       r_constraint_name,
       Chr (Floor (rn / 128)) ||
       Chr ((rn - 128 * Floor (rn / 128))) chr_rank
  FROM rns
), tab AS (
SELECT t.owner,
       t.table_name,
       t.ROWID                   row_id,
       Count(c.ROWID)            n_det,
       Listagg (r.chr_rank, '') WITHIN GROUP (ORDER BY r.chr_rank) lagg
  FROM tab_cp                    t
  JOIN con_cp                    c
    ON c.owner                   = t.owner
   AND c.table_name              = t.table_name
   AND c.constraint_type         = 'R'
  JOIN rch                       r
    ON r.r_owner                 = c.r_owner
   AND r.r_constraint_name       = c.r_constraint_name
 GROUP BY
        t.owner,
        t.table_name,
        t.ROWID
), dup as (
SELECT t1.owner                  owner_1,
       t1.table_name             table_name_1,
       t2.owner                  owner_2,
       t2.table_name             table_name_2,
       t1.n_det                  n_det
  FROM tab                       t1
  JOIN tab                       t2
    ON t2.lagg                   = t1.lagg
   AND t2.row_id                 > t1.row_id
), btw AS (
SELECT owner_1       owner_1_0,
       table_name_1  table_name_1_0,
       owner_1,
       table_name_1,
       owner_2,
       table_name_2,
       n_det
  FROM dup
 UNION
SELECT owner_1,
       table_name_1,
       owner_2,
       table_name_2,
       owner_1,
       table_name_1,
       n_det
  FROM dup
), grp AS (
SELECT owner_1_0,
       table_name_1_0,
       owner_1,
       table_name_1,
       owner_2,
       table_name_2,
       n_det,
       Least (owner_1 || '/' || table_name_1, Min (owner_2 || '/' || table_name_2)
         OVER (PARTITION BY owner_1, table_name_1)) grp_uid
  FROM btw
), rnk AS (
SELECT owner_1_0,
       table_name_1_0,
       owner_1,
       table_name_1,
       Dense_Rank () OVER  (PARTITION BY grp_uid ORDER BY owner_1, table_name_1) r1,
       owner_2,
       table_name_2,
       Dense_Rank () OVER  (PARTITION BY grp_uid ORDER BY owner_2, table_name_2) r2,
       n_det,
       grp_uid
  FROM grp
), rec AS (
SELECT owner_1_0,
       table_name_1_0,
       owner_1,
       table_name_1,
       owner_2,
       table_name_2,
       n_det,
       grp_uid
  FROM rnk
 WHERE (r2 = r1 + 1 AND Mod (r1, 2) = 1) OR (r1 = r2 + 1 AND Mod (r2, 2) = 1)
), rcu AS (
SELECT owner_1_0,
       table_name_1_0,
       owner_1,
       table_name_1,
       owner_2,
       table_name_2,
       n_det,
       grp_uid
  FROM rec
 UNION  
SELECT owner_1,
       table_name_1,
       owner_1,
       table_name_1,
       NULL,
       NULL,
       n_det,
       grp_uid
  FROM grp
 WHERE (owner_1, table_name_1) NOT IN (SELECT owner_1, table_name_1 FROM rec)
 UNION  
SELECT owner,
       table_name,
       owner,
       table_name,
       NULL,
       NULL,
       n_det,
       NULL
  FROM tab
 WHERE (owner, table_name) NOT IN (SELECT owner_1, table_name_1 FROM btw)
)
SELECT
       owner_1,
       table_name_1,
       owner_2,
       table_name_2,
       n_det,
       grp_uid
  FROM rcu
 ORDER BY grp_uid,
       CASE WHEN grp_uid IS NULL THEN 3 
               WHEN owner_2 IS NULL THEN 2
               ELSE 1
          END,
       owner_1_0,
       table_name_1_0,
       owner_1,
       table_name_1

How it Works
The query proceeds by ten stages using query subfactors. The first three stages correspond to query L2_SQF which then has a main query, whereas we now have another six stages before the main query, as shown:

  1. Rank the distinct details [Group by matching fields, then use Row_Number to rank by same]
  2. Convert ranks to base 128 [Use Floor() and Chr() functions; uses 2 characters here]
  3. Aggregate detail ranks for master [Use Listagg on the rank fixed-length string]
  4. Get all matching pairs one-way [Self-join on matching aggregate and second rowid greater]
  5. Union in the reverse pairs, with sorting column [Add in records with uids swapped, but keep uid 1 separately for sorting]
  6. Assign grouping field to each pair [Take minimum of uid 2 over uid 1, or uid 1 if smaller]
  7. Rank each side of pair within its group [Use Dense_Rank() over grouping, ordering by uids]
  8. Retain odd-even sequentially ranked pairs [Retain pairs with ranks (1,2) or (3,4) etc. and the reverses]
  9. Add in unmatched and matched but unreconciled [3-way union: first the reconciled; then the matched but unreconciled; then unmatched]
  10. Sort by the source uid 1, then the current uid 1 [Sort key ensures that reconciled pairs stay together within their matching group]

Notes:

  • In our matching-only sample problem, matching transactions form mutually matching sets, whereas for contra-matching, there are pairs of contra-matching sets as discussed in the first article. The grouping subqueries will therefore differ in the latter case, and for example, pairing could be by matching numerical rank within the respective sets
  • The final subquery factor could be incorporated in the main query, but I retain it because the benchmarking framework does not support unions in the main query, and CBO optimises it away in any case

Results
Both queries were run on the same sample problem as in the previous article. The output comparison gives the output listings for width parameter of 1, which corresponds to the tables and constraints on my v11.2 system copied to the test tables with owner prefix '_0' added. The timings and statistics are for widths from 1 to 6.

Output Comparison

The output file has tabs for the output from both queries, and a tab with examples from each of the three sections for the second. Notice that OEHR_EMPLOYEES and OEHR_JOB_HISTORY form a reconciled pair, with three detail records, while EMPLOYEES and JOB_HISTORY are unmatched, with four detail records. This is because, as I mentioned in the first article, I have added an extra record to each of the latter tables' detail tables (i.e. foreign key constraints), the extra record being a duplicate of one of the other three (in terms of the matching fields), but a different duplicate in each case. This tests correct handling of duplicates within the detail sets.

Performance Comparison
Click on the query name in the file below to jump to the execution plan for the largest data point, and click the tabs to see different views on the performance obtained.

  • The timings in the file above are roughly consistent with linear-time variation with problem size; if anything L2_SQF appears sublinear, but the times are fairly small and there was some other activity on the PC at the time
  • At the largest data point, RECON takes 5 times as much CPU time as L2_SQF, and 9 times as much elapsed time
  • The differences between elapsed and CPU times for RECON are indicative of significant file I/O activity. This shows up in the disk reads and writes summaries on the statistics tab, and in more detail in the Plans tab, and is caused mainly by reading and writing of the subquery factors to and from disk
  • The other main significant factor in the increase in times for the RECON query is the additional sorting; see, for example, steps 31 and 32 in the plan. These effects are the result of the additional subquery factors that were needed to achieve the final result

Conclusions

  • This three-part sequence of articles has focussed on a special category of problem within SQL, but has highlighted a range of SQL techniques that are useful generally, including:
    • Subquery factors
    • Temporary tables
    • Analytic functions
    • Set matching by list aggregation
    • Compact storage of unique identifiers by ranking and base-conversion via the Chr() function
  • We have also noted different ways of matching sets, including use of the MINUS set operator and the NOT EXISTS construct, and discussed ways of handling issues such as duplication within a set, and directionality of the set operators
  • The importance of polynomial order of solution performance for efficiency has been illustrated dramatically
  • The final SQL solution provides a good illustration of the power of modern SQL to solve complex problems using set-based logic combined with sequence in a simpler and faster way than the more conventional procedural approach
  • The subquery-sequence approach to SQL is well suited to diagrammatic design techniques
  • It is often better to solve complex real-world problems by first working with simpler prototypes