Design Patterns for Database Reports with XML Publisher and Email Bursting

In a recent article, A Design Pattern for Oracle eBusiness Audit Trail Reports with XML Publisher, I presented a report that I developed within Oracle eBusiness 12.1, using Oracle XML Publisher. The report was for displaying audit trail data for a particular table, and I proposed that it could be used as a design pattern for audit trail reporting on any eBusiness table. The article focussed on the rather complex SQL, with associated layout structures, needed to render the audit data into a readable format.

In this article, I present a pair of XML Publisher reports intended to serve as design patterns for more general reporting. The reports were again developed within the framework of Oracle's eBusiness embedded version of XML Publisher, of which there is also a stand-alone version,
Oracle Business Intelligence Publisher, which describes the product (rather ungrammatically) thus:

Oracle BI Publisher is the reporting solution to author, manage, and deliver all your reports and documents easier and faster than traditional reporting tools.

Reports Overview

The reports were developed specifically to serve as models for other reports, from which program constructs could be copied and pasted. For this reason I considered taking for the data sources universally available Oracle metadata such as all_tables or all_objects etc., but found problems with that idea, including:

  • They are not generally ordinary tables, and tend to produce very complex execution plans
  • The possible group structures I found were not quite general enough for my purposes

On the other hand, within Oracle eBusiness I had a number of queries available against the (FND application) metadata tables for concurrent programs that seemed to fit the purpose pretty well, and so decided to use these. As a compromise I created views on the tables with generic column names, so that the reports could be ported to other data sources relatively easily.

Concurrent Programs Data Model
In Oracle eBusiness applications concurrent (i.e. batch) programs have a logical metadata record for the program itself, and (potentially) a set of records for each of the following associated entities:

  • Parameter
  • Request group
  • XML layout template

All of these can have zero records, and for the third entity, only programs of specific type can have any records defined, a structure providing a reasonable degree of generality.

Email Bursting
Business application reports are often used to generate documents for sending to customers, suppliers etc., and increasingly companies prefer to email these out to save costs compared with mailing printed documents. In the past report developers had to code the emailing functionality themselves, usually by means of a co-ordinating program that would loop over the master records and call the reporting tool individually for each one, creating an attachment file that would then be emailed, perhaps by the Unix mailx program. As well as the development effort involved, this approach had the drawback that it scales badly as the number of emails rises owing to the individual calls to the reporting tool (such as Oracle Reports). A printed report will frequently create 10,000 records in little more time than for 1,000 records, whereas for the hand-coded emailing process it would likely take 10 times longer.

The bursting functionality in Oracle XML Publisher solves these problems by allowing the emailing processing to be specified by an XML configuration file, and by creating the files to be emailed in batch just as with printed reports. The data files are created initially in XML format from the data model, and are then merged with layout templates that are emailed by a second program.

The design pattern reports will consist of an email bursting version with a printed version, which would generally be required as a fallback for records with missing or invalid email addresses. We'll address the obvious security issue with automated emailing programs as part of our design pattern below.

Report Outputs

Examples of complete report outputs for the printed report in .pdf format are included in the attached zip file. Here three pages are given, two showing the listing for the example report programs themselves, and a third showing a non-XML program.

Example Report - XX Example XML CP
This is page 8 of the printed report run with the only the first parameter set, the application.

  • Only the parameters that were actually set are listed after the title
  • The templates region appears because it is an XML report
  • There is only one template for this printed report, with XSL conditionality to include or exclude regions or columns

XX_ERPXMLCP

Example Report - XX Example XML CP (Email)
This is page 9 of the printed report run with the only the first parameter set, the application.

  • Notice that the bursting file column appears because this report has the file attached, using XSL conditionality
  • This email version has two layout templates that are used conditionally for each record depending on whether it's of XML type or not
  • This record-level choice of template is implemented in the bursting XML file, and is not available in this way for the printed version

XX_ERPXMLCP_EM

Advanced Pricing Report - QP: Java Pricing Engine Warm Up
This is page 44 of the printed report run for the Advanced Pricing application.

  • It shows a non-XML program, and the templates region consequently does not appear
  • Notice that only the summary line appears for the parameters as there are no parameters defined for the program.

QPXJCCWU-2

XML Data Templates and XML Generators

XML Publisher reports require at least one layout template in one of several formats, and also require an executable program to generate the XML data file to merge with the template(s). Oracle supplies a Java executable to generate the data file from an XML data template containing the SQL queries, group structure etc. that the programmer attaches to the concurrent program. This is usually the best approach since it mimimizes the amount of programming required.

It is also possible to use Oracle Reports to generate XML. This can be a reasonable approach for converting legacy reports to XML Publisher to avail of some of the additional features, such as Excel output and email bursting. However, if the data model needs changing it is probably best to extract the SQL and create an XML data template from it.

It's a really bad idea to write a PL/SQL program to generate the XML data file, being a serious case of 'reinventing the wheel' - use Oracle's Java executable with XML data template instead!

Data Model: Groups, Queries, Parameters, Summaries

As described above, the example report has one master data group and three detail groups. As discussed in my last article Query Query Query, the minimum number of queries required for a group structure hierarchy is the number of groups minus the number of parent groups, which equals three here. The group structure with detail queries linked by bind variables could be depicted as below.

Group Structure Diagram
XML Publisher Model - Groups

A possible query structure for the three queries might then look like the following, where I have included summaries for two of the detail groups, and allowed for report constraints via lexical parameters, to be discussed later.

Query Structure Diagram
XML Publisher Model - Queries

Query Links and Bind Variables
It is possible to link detail queries to their master by XML query links, but according to the manual Oracle® XML Publisher Administration and Developer's Guide, Release 12:

XML Publisher tests have shown that using bind variables is more efficient than using the link tag.

Linking a detail query to its master by bind variables involves simply referencing the master variable link columns within the detail query preceded by a colon.

Constraints and Parameters
Reports often have constraints depending on input parameters, which may be mandatory or optional. Input parameters are defined in a parameters section within the data template.

Mandatory parameters can be referenced directly as bind variables within the queries (i.e. by preceding the parameter name with a colon).

Optional parameters should normally be referenced indirectly using additional lexical parameters. This allows the exact query required to be executed rather than a composite query with Nvls to represent all possible query combinations. The latter all-purpose queries tend to be more complex and to perform poorly. Lexical parameters are declared and processed in the associated database package, and referenced by preceding them with an ampersand (&).

Note that, confusingly, these lexical parameters are not the same as the lexical tags applicable only within eBusiness that refer to eBusiness flexfields. Like the corresponding user-exits in eBusiness Oracle Reports the lexical tags allow flexfields to be included without their structure being known to the report developer. This is necessary for the standard eBusiness reports but developers of custom reports normally know the structures, and so can avoid these tags altogether (at least for new reports).

Summaries
There are various ways of computing summaries in XML Publisher reports:

  • Within the SQL
  • within the XML group elements
  • within the layout template

SQL Summaries
Often the best way to do the summaries is in the SQL itself. This makes testing simpler because the queries can be run and debugged in SQL Developer or Toad, and also facilitates production support where developers can often run queries in a read-only schema, but can't change the production code.

In the simple case of summarising detail groups that are defined against a main query, the summaries can be done using analytic functions partitioning by the master key for first level details, and as desired for any subsequent levels. In my example reports, Example Line One is summarised in this way; a subquery factor was used but this is optional here.

Groups defined against additional queries cannot be summarised quite so easily in the SQL. I have summarised the Example Line Two group by adding a subquery factor in the main query purely to do the summaries: Because the line detail is not present in the query we can't do this by analytic functions, so a grouping summary in a separate subquery is necessary.

XML Group Summaries
Where possible SQL summaries via analytic functions appears best, but in other cases we may wish to consider the non-SQL alternatives. One of these is to define the summaries within the appropriate group as elements in the XML template setting the value attribute to the element in the lower level group to be summarised, and setting the function attribute as desired within a limited set of functions. I have not implemented this method in my examples.

XSL Layout Summaries
A further non-SQL alternative for summaries is to define them within the layout template using the XSL language, and I have implemented the summaries of Example Line Three using this method.

Database Package and Report Triggers

Each XML Publisher has a default package specified in the XML template, that handles parameter processing and implements any triggers required.

Package Spec
The spec declares both input parameters and lexical parameters, as well as the procedures that implement the report triggers (usually three).

Triggers

Before Report

  • This trigger is used to process the lexical parameters, constructing the appropriate where condition depending on which optional parameters have values
  • The example packages show how ranges of character, date and number are processed. I have found it more convenient to pass dates as string parameters.

After Report

  • This trigger can be used to write column headings to the log file for fields that are logged in the group filter

Group Filter

  • This trigger can be used to filter out records depending on some condition, by returning FALSE.
  • I use it in the examples to log selected fields for each record returned.
  • This logging is particularly important for email bursting reports as it enables checking that emails to be sent externally are appropriate before sending in a second step.

Printed Example Report Details

Report Parameters
The report parameters were designed to illustrate the implementation of character, date and number ranges, with an equality join parameter that fits well with the report content, all optional:

  • Application - the eBusiness application id
  • Program name From and To parameters - character range
  • Creation date From and To parameters - date range
  • Number of parameters From and To parameters - number range

Where there is an associated email version it may be worth having an additional Yes/No parameter determining whether to include records that have an email address defined.

Layout Template
There is a single layout template, of .rtf format.

Layout Template Structure
XML Publisher Model - Layout

Layout Template File

Loading...

Notes on Layout

  • Body and Margins
    • The page numbers are in the bottom margin (or footer)
    • The title, parameters and header fields are above the body tag, and repeat on each page of a given header record
  • Conditional Blocks and Columns
    • XSL if-blocks are used to conditionally print blocks such as input parameters, or detail tables depending on data values
    • XSL if-blocks with column qualifiers are used to conditionally print table columns: If there is no bursting file, the entire bursting file column will not appear
  • Sections and Page Breaks
    • The XSL for-each field for the header group uses the section qualifier, which, in conjunction with the page break field, specifies a new page after each header record except the last
    • By default, the above section qualifier would specify that page numbers reset with each header record, but this is overriden here by the initial-page-number field in the footer
  • XSL Summary
    • As discussed in the data model section, the line three summary is implemented in XSL - by the XSL count field

Email Bursting Example Report

Email Address Parameters
In addition to the printed report constraint parameters, three email address parameters have been added.

  • Override email address - setting a value here overrides the record level email address and is useful for testing
  • From and CC email addresses are parameters, which can be defaulted, for flexibility
  • The email address data source is hard-coded in this demo report: normally it would be something like a supplier email address at master record level

Two Step Bursting Process
The first step is to run the concurrent program for the report, which results in an XML file containing the report data to be created on the server. In the second step, a standard concurrent program 'XML Publisher Report Bursting Program' is run, passing the request id of the earlier custom report, in order to merge the XML data file with the layout template(s) and send the individual reports by email.

It's sometimes recommended to trigger the running of the second program within the custom report itself, but it's probably better not to do this, in order to allow validation of the reports and email addresses before sending them out.

Email Report Logging
The example report logs the email address and other fields in the concurrent program log file, including whether an override email address was specified, where the user can check them before executing the second step that sends out the emails.

Layout Template
There are two layout templates, of .rtf format, which are used conditionally depending on the type of record. The structure of the one used for XML type programs (shown below) has conditional header fields and the third lines block that are absent from the other (not shown).

Layout Template Structure
XML Publisher Model - Layout EM

Layout Template File

Loading...

Notes on Layout

  • The master group is excluded from the template, although its fields are present in the header. The XML Publisher bursting program performs the looping over this group

Email Bursting XML File

<xapi:requestset xmlns:xapi="http://xmlns.oracle.com/oxp/xapi" type="bursting">
<xapi:request select="/XX_ERPXMLCP_EM/LIST_G_HDR/G_HDR">
<xapi:delivery>
<xapi:email server="127.0.0.1" port="25" from="${FROM_EMAIL}" reply-to ="">
<xapi:message id="123" to="${EMAIL_ADDRESS}" cc="${CC_EMAIL}" 
attachment="true" subject="${SUB}">Dear Sir/Madam

Pleae find attached your concurrent program details. 

[Alternatively, leave blank and use output_type=html in document to have attachment go to body instead]</xapi:message>
</xapi:email>
</xapi:delivery>
<xapi:document output="CP_${PROG_NAME}.pdf" output-type="pdf" delivery="123">
<xapi:template type="rtf" location="XDO://CUSTOM.XX_ERPXMLCP_EM.en.US/?getSource=true" filter="//G_HDR[OUTPUT_TYPE!='XML']"></xapi:template>
<xapi:template type="rtf" location="XDO://CUSTOM.XX_ERPXMLCP_EM_XML.en.US/?getSource=true" filter="//G_HDR[OUTPUT_TYPE='XML']"></xapi:template>
</xapi:document>
</xapi:request>
</xapi:requestset>

Notes on Bursting File

  • The email server is specified in the bursting XML configuration file, along with the email addresses to be used, which can be variables from the selected records or hard-coded
  • The layout template is specified, and more than one can be included, as here, with filter conditions depending on selected data fields
  • Here the output is specified to be sent as a .pdf attachment; changing to output type html results in the the output appearing as body text
  • It can be useful to store some or all of the subject (or body) text on the database; the table fnd_messages is used to store the subject here, as records from the fnd tables can be installed automatically via fndload

Code to Download

The zip file Brendan_Model_XML_Reports contains a root folder and three subfolders, as follows:

  • Root - MD120 installation document for the email version. It references a generic Unix script that installs all objects. I will publish this script at a later date
  • Output - examples of log files for both printed and email versions and pdf outputs for printed version
  • XX_ERPXMLCP - complete code and metadata for the printed version
  • XX_ERPXMLCP_EM - complete code and metadata for the email version (except requires some printed version objects to be installed first)






Query Query Query

In my last post, A Design Pattern for Oracle eBusiness Audit Trail Reports with XML Publisher, I described a database report module developed in Oracle's XML Publisher tool. Of the report structure I wrote:

It has a master entity with two independent detail entities, and therefore requires a minimum of two queries.

But why does such a structure require two queries? And can we determine the minimum number of queries for reports in general?

To start with, let's define a report in this context as being a hierarchy of record groups, where:

  • a record group is a set of records having the same columns with (possibly) differing values
  • each group is linked to a single record in its (single) parent group by values in the parent record, except the top level (or root) group

For example, in the earlier post the root group is a set of bank accounts, with the two detail (or child) groups being the set of owners of the bank account and the set of audit records for the bank account parent record. Corresponding to this group structure, each bank account record is the root of the data hierarchies, comprising two sets of records below the bank account record, one for the owners and one for the audit records linked to the root record by the bank account id.

A (relational) query always returns a flat record set, and it's this fact that determines the minimum number of queries required for a given group structure. A master-detail group structure can be flattened in the query by simply copying master fields on to the child record sets. The set cardinality is then the cardinality of the child set. The report designer uses their chosen reporting tool to specify display of the queried data in either flat, or in master-detail format.

In fact this approach works for any number of child levels, with the query cardinality being the number of bottom level descendants (using null records for potential parents that are in fact childless). It's clear though that the approach will not work for any second child at the same level because there would be two cardinalities and no meaningful single record for both child groups could be constructed within a flat query.

This reasoning leads to the conclusion that the minimum number of queries required in general is equal to the number of groups minus the number of parent groups.

In the earlier post I also stated:

This minimum number of queries is usually the best choice...

There are two main reasons for this:

  • each child query fires for every record returned by its parent, with associated performance impact
  • maintenance tends to be more difficult with extra queries; this is much worse when the individual groups, which should almost always be implemented by a maximum of one query each, are split, and then need to be joined back together procedurally

On thinking about this, it occurred to me that if the group structure were defined in a metadata table we might be able to return minimum query structures using an SQL query. Just one, obviously :) . To save effort we could use Oracle's handy HR demo schema with the employee hierarchy representing groups.

The remainder of this article describes the query I came up with. As it's about hierarchies, recursion is the technique to use, and this is one of those cases where Oracle's old tree-walk syntax is too limited, so I am using the Oracle 11.2 recursive subquery factoring feature.

The query isn't going to be of practical value for report group structures since these are always quite small in size, but I expect there are different applications where this kind of Primogeniture Recursion would be useful.

Query Groups Query - Primogeniture Recursion
Query Structure Diagram

Query Query Query

SQL

WITH rsf (last_name, employee_id, lev, part_id, manager_id) AS (
SELECT last_name, employee_id, 0, employee_id, To_Number(NULL)
  FROM employees
 WHERE manager_id IS NULL
UNION ALL
SELECT e.last_name, e.employee_id, r.lev + 1, 
       CASE WHEN Row_Number() OVER (PARTITION BY r.employee_id ORDER BY e.last_name) = 1 THEN r.part_id ELSE e.employee_id END,
       e.manager_id
  FROM rsf r
  JOIN employees e
    ON e.manager_id = r.employee_id
)
SELECT part_id, LPad ('.', lev) || last_name last_name, employee_id, 
       Count(DISTINCT part_id) OVER () "#Partitions",
       Count(DISTINCT manager_id) OVER () "+ #Parents",
       Count(*) OVER () "= #Records"
  FROM rsf
 ORDER BY part_id, lev, last_name

Query Output

   PART_ID LAST_NAME            EMPLOYEE_ID #Partitions + #Parents = #Records
---------- -------------------- ----------- ----------- ---------- ----------
       100 King                         100          89         18        107
           .Cambrault                   148          89         18        107
            .Bates                      172          89         18        107
       101 .Kochhar                     101          89         18        107
            .Baer                       204          89         18        107
       102 .De Haan                     102          89         18        107
            .Hunold                     103          89         18        107
             .Austin                    105          89         18        107
       104   .Ernst                     104          89         18        107
       106   .Pataballa                 106          89         18        107
       107   .Lorentz                   107          89         18        107
       108  .Greenberg                  108          89         18        107
             .Chen                      110          89         18        107
       109   .Faviet                    109          89         18        107
       111   .Sciarra                   111          89         18        107
       112   .Urman                     112          89         18        107
       113   .Popp                      113          89         18        107
       114 .Raphaely                    114          89         18        107
            .Baida                      116          89         18        107
       115  .Khoo                       115          89         18        107
       117  .Tobias                     117          89         18        107
       118  .Himuro                     118          89         18        107
       119  .Colmenares                 119          89         18        107
       120 .Weiss                       120          89         18        107
            .Fleaur                     181          89         18        107
       121 .Fripp                       121          89         18        107
            .Atkinson                   130          89         18        107
       122 .Kaufling                    122          89         18        107
            .Chung                      188          89         18        107
       123 .Vollman                     123          89         18        107
            .Bell                       192          89         18        107
       124 .Mourgos                     124          89         18        107
            .Davies                     142          89         18        107
       125  .Nayer                      125          89         18        107
       126  .Mikkilineni                126          89         18        107
       127  .Landry                     127          89         18        107
       128  .Markle                     128          89         18        107
       129  .Bissot                     129          89         18        107
       131  .Marlow                     131          89         18        107
       132  .Olson                      132          89         18        107
       133  .Mallin                     133          89         18        107
       134  .Rogers                     134          89         18        107
       135  .Gee                        135          89         18        107
       136  .Philtanker                 136          89         18        107
       137  .Ladwig                     137          89         18        107
       138  .Stiles                     138          89         18        107
       139  .Seo                        139          89         18        107
       140  .Patel                      140          89         18        107
       141  .Rajs                       141          89         18        107
       143  .Matos                      143          89         18        107
       144  .Vargas                     144          89         18        107
       145 .Russell                     145          89         18        107
            .Bernstein                  151          89         18        107
       146 .Partners                    146          89         18        107
            .Doran                      160          89         18        107
       147 .Errazuriz                   147          89         18        107
            .Ande                       166          89         18        107
       149 .Zlotkey                     149          89         18        107
            .Abel                       174          89         18        107
       150  .Tucker                     150          89         18        107
       152  .Hall                       152          89         18        107
       153  .Olsen                      153          89         18        107
       154  .Cambrault                  154          89         18        107
       155  .Tuvault                    155          89         18        107
       156  .King                       156          89         18        107
       157  .Sully                      157          89         18        107
       158  .McEwen                     158          89         18        107
       159  .Smith                      159          89         18        107
       161  .Sewall                     161          89         18        107
       162  .Vishney                    162          89         18        107
       163  .Greene                     163          89         18        107
       164  .Marvins                    164          89         18        107
       165  .Lee                        165          89         18        107
       167  .Banda                      167          89         18        107
       168  .Ozer                       168          89         18        107
       169  .Bloom                      169          89         18        107
       170  .Fox                        170          89         18        107
       171  .Smith                      171          89         18        107
       173  .Kumar                      173          89         18        107
       175  .Hutton                     175          89         18        107
       176  .Taylor                     176          89         18        107
       177  .Livingston                 177          89         18        107
       178  .Grant                      178          89         18        107
       179  .Johnson                    179          89         18        107
       180  .Taylor                     180          89         18        107
       182  .Sullivan                   182          89         18        107
       183  .Geoni                      183          89         18        107
       184  .Sarchand                   184          89         18        107
       185  .Bull                       185          89         18        107
       186  .Dellinger                  186          89         18        107
       187  .Cabrio                     187          89         18        107
       189  .Dilly                      189          89         18        107
       190  .Gates                      190          89         18        107
       191  .Perkins                    191          89         18        107
       193  .Everett                    193          89         18        107
       194  .McCain                     194          89         18        107
       195  .Jones                      195          89         18        107
       196  .Walsh                      196          89         18        107
       197  .Feeney                     197          89         18        107
       198  .OConnell                   198          89         18        107
       199  .Grant                      199          89         18        107
       200  .Whalen                     200          89         18        107
       201 .Hartstein                   201          89         18        107
            .Fay                        202          89         18        107
       203  .Mavris                     203          89         18        107
       205  .Higgins                    205          89         18        107
             .Gietz                     206          89         18        107

107 rows selected.

 






A Design Pattern for Oracle eBusiness Audit Trail Reports with XML Publisher

Oracle eBusiness applications allow audit history records to be automatically maintained on database tables, as explained in the release 12 System Administrator's guide, Reporting On AuditTrail Data.

Oracle E-Business Suite provides an auditing mechanism based on Oracle database triggers. AuditTrail stores change information in a "shadow table" of the audited table. This mechanism saves audit data in an uncompressed but "sparse" format, and you enable auditing for particular tables and groups of tables ("audit groups").

Oracle provides an Audit Query Navigator page where it is possible to search for changes by primary key values. For reporting purposes, the manual says:

You should write audit reports as needed. Audit Trail provides the views of your shadow tables to make audit reporting easier; you can write your reports to use these views.

In fact the views are of little practical use, and it is quite hard to develop reports that are user-friendly, efficient and not over-complex, owing, amongst other things, to the "sparse" data format. However, once you have developed one you can use that as a design pattern and starting point for audit reporting on any of the eBusiness tables.

In this article I provide such a report for auditing external bank account changes on Oracle eBusiness 12.1. The report displays the current record, with lookup information, followed by a list of the changes within an input date range. Only records that have changes within that range are included, and for each change only the fields that were changed are listed. The lookup information includes a list of detail records, making the report overall pretty general in structure: It has a master entity with two independent detail entities, and therefore requires a minimum of two queries. This minimum number of queries is usually the best choice and is what I have implemented (it's fine to have an extra query for global data, but I don't have any here). The main query makes extensive use of analytic functions, case expressions and subquery factors to achieve the necessary data transformations as simply and efficiently as possible.

The report is implemented in XML (or BI) Publisher, which is the main batch reporting tool for Oracle eBusiness.

I start by showing sample output from the report, followed by the RTF template. The queries are then documented, starting with query structure diagrams with annotations explaining the logic. A link is included to a zip file with all the code and templates needed to install the report. Oracle provides extensive documentation on the setup of Auditing and developing in XML Publisher, so I will not cover this.

Report Layout
Example Output in Excel Format

  • There are three regions
    • Bank Account Current Record - the master record
    • Owners - first detail block, listing the owners of the bank account
    • Bank Account Changes - the second detail block, listing the audit history. Note that unchanged fields are omitted
  • Note that some audit fields are displayed directly, such as account number, while for others, such as branch number, the display value is on a referenced table

XML Publisher RTF Tempate

  • Note that each audit field has its own row in the table, but the if-block excludes it if both old and new values are null

Audit Query
Query Structure Diagram
Audit Query QSD
Subquery Tabulation

SQL

WITH audit_range AS (
SELECT DISTINCT ext_bank_account_id
  FROM iby_ext_bank_accounts_a aup
 WHERE 1=1
&lp_beg_dat
&lp_end_dat
), audit_union AS (
SELECT ext_bank_account_id acc_id,
       CASE WHEN Substr (audit_true_nulls, 2, 1) = 'Y' OR audit_transaction_type = 'I' THEN '*NULL*' ELSE bank_account_name END acc_name,
       CASE WHEN Substr (audit_true_nulls, 3, 1) = 'Y' OR audit_transaction_type = 'I' THEN '*NULL*' ELSE bank_account_num END acc_num,
       CASE WHEN Substr (audit_true_nulls, 8, 1) = 'Y' OR audit_transaction_type = 'I' THEN '*NULL*' ELSE iban END iban,
       CASE WHEN Substr (audit_true_nulls, 7, 1) = 'Y' OR audit_transaction_type = 'I' THEN '*NULL*' ELSE currency_code END curr,
       CASE WHEN Substr (audit_true_nulls, 6, 1) = 'Y' OR audit_transaction_type = 'I' THEN '*NULL*' ELSE country_code END coun,
       CASE WHEN Substr (audit_true_nulls, 4, 1) = 'Y' OR audit_transaction_type = 'I' THEN 0 ELSE bank_id END bank_id,
       CASE WHEN Substr (audit_true_nulls, 5, 1) = 'Y' OR audit_transaction_type = 'I' THEN 0 ELSE branch_id END branch_id,
       audit_sequence_id seq_id,
       audit_timestamp,
       audit_user_name a_user,
       CASE WHEN audit_transaction_type = 'I' THEN 'INSERT' ELSE 'UPDATE' END a_type
  FROM iby_ext_bank_accounts_a aup
 WHERE aup.ext_bank_account_id IN (SELECT ext_bank_account_id FROM audit_range)
&lp_beg_dat
 UNION
SELECT bac.ext_bank_account_id,
       bac.bank_account_name,
       bac.bank_account_num,
       bac.iban,
       bac.currency_code,
       bac.country_code,
       bac.bank_id,
       bac.branch_id,
       NULL,
       bac.last_update_date,
       usr.user_name,
       NULL
  FROM iby_ext_bank_accounts            bac
  JOIN fnd_user                         usr
    ON usr.user_id                      = bac.last_updated_by
 WHERE bac.ext_bank_account_id IN (SELECT ext_bank_account_id FROM audit_range)
), audit_pairs AS (
SELECT acc_id,
       acc_name,
       bank_id,
       First_Value (bank_id IGNORE NULLS) OVER 
        (PARTITION BY acc_id ORDER BY audit_timestamp, seq_id ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) bank_id_n,
       branch_id,
       First_Value (branch_id IGNORE NULLS) OVER 
        (PARTITION BY acc_id ORDER BY audit_timestamp, seq_id ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) branch_id_n,
       First_Value (acc_name IGNORE NULLS) OVER 
        (PARTITION BY acc_id ORDER BY audit_timestamp, seq_id ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) acc_name_n,
       acc_num,
       First_Value (acc_num IGNORE NULLS) OVER 
        (PARTITION BY acc_id ORDER BY audit_timestamp, seq_id ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) acc_num_n,
       iban,
       First_Value (iban IGNORE NULLS) OVER 
        (PARTITION BY acc_id ORDER BY audit_timestamp, seq_id ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) iban_n,
       curr,
       First_Value (curr IGNORE NULLS) OVER 
        (PARTITION BY acc_id ORDER BY audit_timestamp, seq_id ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) curr_n,
       coun,
       First_Value (iban IGNORE NULLS) OVER 
        (PARTITION BY acc_id ORDER BY audit_timestamp, seq_id ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) coun_n,
       seq_id,
       audit_timestamp,
       a_user,
       a_type
  FROM audit_union
)
SELECT aup.acc_id,
       par_bnk.party_name bank_name,
       par_brn.party_name bra_name,
       orp.bank_or_branch_number bra_num,
       bac.bank_account_name acc_name,
       bac.bank_account_num acc_num,
       bac.iban,
       bac.country_code coun,
       bac.currency_code curr,
       To_Char (bac.creation_date, 'DD-MON-YYYY HH24:MI:SS') creation_date,
       usr.user_name created_by,
       To_Char (aup.audit_timestamp, 'DD-MON-YYYY HH24:MI:SS') a_time,
       aup.a_user,
       aup.a_type,
/*
attr: 1. NULL -> no change; 2. 0/'*NULL*' -> change from null; 3. 'other'-> change from not null
        old: 1 and 2 both return NULL; 3 returns old not null value
        new: only return value for 2 and 3, meaning some change
*/
       CASE WHEN aup.bank_id != 0 THEN par_bnk_o.party_name END bank_name_o,
       CASE WHEN aup.bank_id IS NOT NULL THEN CASE WHEN aup.bank_id_n != 0 THEN par_bnk_o.party_name END END bank_name_n,
       CASE WHEN aup.branch_id != 0 THEN par_brn_o.party_name END bra_name_o,
       CASE WHEN aup.branch_id IS NOT NULL THEN CASE WHEN aup.branch_id_n != 0 THEN par_brn_n.party_name END END bra_name_n,
       CASE WHEN aup.branch_id != 0 THEN orp_o.bank_or_branch_number END bra_num_o,
       CASE WHEN aup.branch_id IS NOT NULL THEN CASE WHEN aup.branch_id_n != 0 THEN orp_n.bank_or_branch_number END END bra_num_n,
       CASE WHEN aup.acc_name != '*NULL*' THEN aup.acc_name END acc_name_o,
       CASE WHEN aup.acc_name IS NOT NULL THEN CASE WHEN aup.acc_name_n != '*NULL*' THEN aup.acc_name_n END END acc_name_n,
       CASE WHEN aup.acc_num != '*NULL*' THEN aup.acc_num END acc_num_o,
       CASE WHEN aup.acc_num IS NOT NULL THEN CASE WHEN aup.acc_num_n != '*NULL*' THEN aup.acc_num_n END END acc_num_n,
       CASE WHEN aup.iban != '*NULL*' THEN aup.iban END iban_o,
       CASE WHEN aup.iban IS NOT NULL THEN CASE WHEN aup.iban_n != '*NULL*' THEN aup.iban_n END END iban_n,
       CASE WHEN aup.curr != '*NULL*' THEN aup.curr END curr_o,
       CASE WHEN aup.curr IS NOT NULL THEN CASE WHEN aup.curr_n != '*NULL*' THEN aup.curr_n END END curr_n,
       CASE WHEN aup.coun != '*NULL*' THEN aup.coun END coun_o,
       CASE WHEN aup.coun IS NOT NULL THEN CASE WHEN aup.coun_n != '*NULL*' THEN aup.coun_n END END coun_n
  FROM audit_pairs                      aup
  JOIN iby_ext_bank_accounts            bac
    ON bac.ext_bank_account_id          = aup.acc_id
  LEFT JOIN hz_parties                  par_bnk
    ON par_bnk.party_id                 = bac.bank_id
  LEFT JOIN hz_parties                  par_bnk_o
    ON par_bnk_o.party_id               = aup.bank_id
  LEFT JOIN hz_parties                  par_bnk_n
    ON par_bnk_n.party_id               = aup.bank_id_n
  LEFT JOIN hz_parties                  par_brn
    ON par_brn.party_id                 = bac.branch_id
  LEFT JOIN hz_organization_profiles    orp
    ON orp.party_id                     = par_brn.party_id
   AND SYSDATE BETWEEN Trunc (orp.effective_start_date) AND Nvl (Trunc (orp.effective_end_date), SYSDATE+1)
  LEFT JOIN hz_parties                  par_brn_o
    ON par_brn_o.party_id               = aup.branch_id
  LEFT JOIN hz_organization_profiles    orp_o
    ON orp_o.party_id                   = par_brn_o.party_id
   AND SYSDATE BETWEEN Trunc (orp_o.effective_start_date) AND Nvl (Trunc (orp_o.effective_end_date), SYSDATE+1)
  LEFT JOIN hz_parties                  par_brn_n
    ON par_brn_n.party_id               = aup.branch_id_n
  LEFT JOIN hz_organization_profiles    orp_n
    ON orp_n.party_id                   = par_brn_n.party_id
   AND SYSDATE BETWEEN Trunc (orp_n.effective_start_date) AND Nvl (Trunc (orp_n.effective_end_date), SYSDATE+1)
  JOIN fnd_user                         usr
    ON usr.user_id                      = bac.created_by
 WHERE aup.seq_id                       IS NOT NULL
&lp_beg_dat
&lp_end_dat
 ORDER BY aup.acc_id, aup.audit_timestamp DESC, aup.seq_id DESC

Owners Query
Query Structure Diagram
Owners Query
Subquery Tabulation

SQL

SELECT par.party_name owner,
       sup.vendor_name,
       sup.segment1
  FROM iby_account_owners               own
  JOIN hz_parties                       par
    ON par.party_id                     = own.account_owner_party_id
  LEFT JOIN ap_suppliers                sup
    ON sup.party_id                     = par.party_id
 WHERE own.ext_bank_account_id          = :ACC_ID

Code for Bank Account Auditing XML Publisher Report

XX_IBYBNKAUDIT






NoCOUG SQL Challenge 2014 Illustrated

An SQL challenge was posted recently on the blog of the Northern California Oracle user group, SQL Mini-challenge. A query was given against Oracle's demo HR schema, with description:

It lists the locations containing a department that either contains an employee named Steven King or an employee who holds the title of President or an employee who has previously held the title of President.

The challenge was to rewrite the query avoiding the relatively expensive existence subqueries, and minimising the number of consistent gets reported on the small demo data set. Oracle's Cost-Based Optimiser will itself transform queries within its parsing phase, but at a relatively low level; for example, an 'OR' condition might be changed to a union if the CBO thinks that will aid performance. The solutions to the challenge present a nice illustration of how more extensive query transfomations can improve performance and modularity characteristics.

In this article I will list four equivalent queries for the problem, three based on solutions provided on the blog, using Ansi syntax here for consistency. For each query I give the output from DBMS_XPlan, and include a query structure diagram following my own diagramming notation. The example query provided in the challenge is not in fact the most literal translation of the requirement into SQL, and I think it will be interesting to start with my idea of what that would be.

Query 1: Literal
This my attempt at the most literal translation of the stated requirement into SQL. The three conditions are all separately expressed as existence subqueries.

QSD Literal

NCOUG-Literal

Query Literal

SELECT l.location_id, l.city
  FROM locations l
 WHERE EXISTS
  (SELECT *
     FROM departments d
     JOIN employees e
       ON e.department_id = d.department_id
    WHERE d.location_id = l.location_id
      AND e.first_name = 'Steven' AND e.last_name = 'King'
  ) OR EXISTS
  (SELECT *
     FROM departments d
     JOIN employees e
       ON e.department_id = d.department_id
     JOIN jobs j
       ON j.job_id = e.job_id
    WHERE d.location_id = l.location_id
      AND j.job_title = 'President'
  ) OR EXISTS
  (SELECT *
     FROM departments d
     JOIN employees e
       ON e.department_id = d.department_id
     JOIN job_history h
       ON h.employee_id = e.employee_id
     JOIN jobs j2
       ON j2.job_id = h.job_id
    WHERE j2.job_title   = 'President'
  )

XPlan Literal

------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name              | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |                   |      1 |        |      1 |00:00:00.01 |     279 |       |       |          |
|*  1 |  FILTER                         |                   |      1 |        |      1 |00:00:00.01 |     279 |       |       |          |
|   2 |   VIEW                          | index$_join$_001  |      1 |     23 |     23 |00:00:00.01 |       7 |       |       |          |
|*  3 |    HASH JOIN                    |                   |      1 |        |     23 |00:00:00.01 |       7 |  1023K|  1023K| 1402K (0)|
|   4 |     INDEX FAST FULL SCAN        | LOC_CITY_IX       |      1 |     23 |     23 |00:00:00.01 |       3 |       |       |          |
|   5 |     INDEX FAST FULL SCAN        | LOC_ID_PK         |      1 |     23 |     23 |00:00:00.01 |       4 |       |       |          |
|   6 |   NESTED LOOPS                  |                   |     23 |        |      1 |00:00:00.01 |      92 |       |       |          |
|   7 |    NESTED LOOPS                 |                   |     23 |      1 |     23 |00:00:00.01 |      69 |       |       |          |
|   8 |     TABLE ACCESS BY INDEX ROWID | EMPLOYEES         |     23 |      1 |     23 |00:00:00.01 |      46 |       |       |          |
|*  9 |      INDEX RANGE SCAN           | EMP_NAME_IX       |     23 |      1 |     23 |00:00:00.01 |      23 |       |       |          |
|* 10 |     INDEX UNIQUE SCAN           | DEPT_ID_PK        |     23 |      1 |     23 |00:00:00.01 |      23 |       |       |          |
|* 11 |    TABLE ACCESS BY INDEX ROWID  | DEPARTMENTS       |     23 |      1 |      1 |00:00:00.01 |      23 |       |       |          |
|  12 |   NESTED LOOPS                  |                   |     22 |        |      0 |00:00:00.01 |     173 |       |       |          |
|  13 |    NESTED LOOPS                 |                   |     22 |      1 |     88 |00:00:00.01 |     166 |       |       |          |
|  14 |     NESTED LOOPS                |                   |     22 |      2 |      6 |00:00:00.01 |     160 |       |       |          |
|* 15 |      TABLE ACCESS FULL          | JOBS              |     22 |      1 |     22 |00:00:00.01 |     132 |       |       |          |
|  16 |      TABLE ACCESS BY INDEX ROWID| DEPARTMENTS       |     22 |      2 |      6 |00:00:00.01 |      28 |       |       |          |
|* 17 |       INDEX RANGE SCAN          | DEPT_LOCATION_IX  |     22 |      2 |      6 |00:00:00.01 |      22 |       |       |          |
|* 18 |     INDEX RANGE SCAN            | EMP_DEPARTMENT_IX |      6 |     10 |     88 |00:00:00.01 |       6 |       |       |          |
|* 19 |    TABLE ACCESS BY INDEX ROWID  | EMPLOYEES         |     88 |      1 |      0 |00:00:00.01 |       7 |       |       |          |
|  20 |   NESTED LOOPS                  |                   |      1 |        |      0 |00:00:00.01 |       7 |       |       |          |
|  21 |    NESTED LOOPS                 |                   |      1 |      1 |      0 |00:00:00.01 |       7 |       |       |          |
|  22 |     NESTED LOOPS                |                   |      1 |      1 |      0 |00:00:00.01 |       7 |       |       |          |
|* 23 |      TABLE ACCESS FULL          | JOBS              |      1 |      1 |      1 |00:00:00.01 |       6 |       |       |          |
|  24 |      TABLE ACCESS BY INDEX ROWID| JOB_HISTORY       |      1 |      1 |      0 |00:00:00.01 |       1 |       |       |          |
|* 25 |       INDEX RANGE SCAN          | JHIST_JOB_IX      |      1 |      1 |      0 |00:00:00.01 |       1 |       |       |          |
|* 26 |     INDEX UNIQUE SCAN           | EMP_EMP_ID_PK     |      0 |      1 |      0 |00:00:00.01 |       0 |       |       |          |
|* 27 |    TABLE ACCESS BY INDEX ROWID  | EMPLOYEES         |      0 |      1 |      0 |00:00:00.01 |       0 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(( IS NOT NULL OR  IS NOT NULL OR  IS NOT NULL))
   3 - access(ROWID=ROWID)
   9 - access("E"."LAST_NAME"='King' AND "E"."FIRST_NAME"='Steven')
  10 - access("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID")
  11 - filter("D"."LOCATION_ID"=:B1)
  15 - filter("J"."JOB_TITLE"='President')
  17 - access("D"."LOCATION_ID"=:B1)
  18 - access("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID")
  19 - filter("J"."JOB_ID"="E"."JOB_ID")
  23 - filter("J2"."JOB_TITLE"='President')
  25 - access("J2"."JOB_ID"="H"."JOB_ID")
  26 - access("H"."EMPLOYEE_ID"="E"."EMPLOYEE_ID")
  27 - filter("E"."DEPARTMENT_ID" IS NOT NULL)

Query 2: NoCOUG Example
This is the example in the original challenge article, translated into Ansi syntax. It nests the job history existence subquery within an outer existence subquery, and references the departments and employees tables only once.

QSD NoCOUG Example

NCOUG-Example

Query NoCOUG Example

SELECT l.location_id, l.city
  FROM locations l
 WHERE EXISTS
  (SELECT *
     FROM departments d
     JOIN employees e
       ON e.department_id = d.department_id
     JOIN jobs j
       ON j.job_id = e.job_id
    WHERE d.location_id = l.location_id
      AND ( 
            (e.first_name = 'Steven' AND e.last_name = 'King')
         OR j.job_title = 'President'
         OR EXISTS
            (SELECT *
               FROM job_history h
               JOIN jobs j2
                 ON j2.job_id = h.job_id
              WHERE h.employee_id = e.employee_id
                AND j2.job_title   = 'President'
            ) 
          )
  )

XPlan NoCOUG Example

-------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name              | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |                   |      1 |        |      1 |00:00:00.01 |     152 |       |       |          |
|*  1 |  HASH JOIN SEMI                  |                   |      1 |      7 |      1 |00:00:00.01 |     152 |  1156K|  1156K| 1120K (0)|
|   2 |   VIEW                           | index$_join$_001  |      1 |     23 |     23 |00:00:00.01 |       6 |       |       |          |
|*  3 |    HASH JOIN                     |                   |      1 |        |     23 |00:00:00.01 |       6 |  1023K|  1023K| 1443K (0)|
|   4 |     INDEX FAST FULL SCAN         | LOC_CITY_IX       |      1 |     23 |     23 |00:00:00.01 |       3 |       |       |          |
|   5 |     INDEX FAST FULL SCAN         | LOC_ID_PK         |      1 |     23 |     23 |00:00:00.01 |       3 |       |       |          |
|   6 |   VIEW                           | VW_SQ_1           |      1 |     11 |      1 |00:00:00.01 |     146 |       |       |          |
|*  7 |    FILTER                        |                   |      1 |        |      1 |00:00:00.01 |     146 |       |       |          |
|*  8 |     HASH JOIN                    |                   |      1 |    106 |    106 |00:00:00.01 |      15 |   876K|   876K|  895K (0)|
|   9 |      MERGE JOIN                  |                   |      1 |    107 |    107 |00:00:00.01 |       8 |       |       |          |
|  10 |       TABLE ACCESS BY INDEX ROWID| JOBS              |      1 |     19 |     19 |00:00:00.01 |       2 |       |       |          |
|  11 |        INDEX FULL SCAN           | JOB_ID_PK         |      1 |     19 |     19 |00:00:00.01 |       1 |       |       |          |
|* 12 |       SORT JOIN                  |                   |     19 |    107 |    107 |00:00:00.01 |       6 | 15360 | 15360 |14336  (0)|
|  13 |        TABLE ACCESS FULL         | EMPLOYEES         |      1 |    107 |    107 |00:00:00.01 |       6 |       |       |          |
|  14 |      TABLE ACCESS FULL           | DEPARTMENTS       |      1 |     27 |     27 |00:00:00.01 |       7 |       |       |          |
|  15 |     NESTED LOOPS                 |                   |    105 |        |      0 |00:00:00.01 |     131 |       |       |          |
|  16 |      NESTED LOOPS                |                   |    105 |      1 |     10 |00:00:00.01 |     121 |       |       |          |
|  17 |       TABLE ACCESS BY INDEX ROWID| JOB_HISTORY       |    105 |      1 |     10 |00:00:00.01 |     112 |       |       |          |
|* 18 |        INDEX RANGE SCAN          | JHIST_EMPLOYEE_IX |    105 |      1 |     10 |00:00:00.01 |     105 |       |       |          |
|* 19 |       INDEX UNIQUE SCAN          | JOB_ID_PK         |     10 |      1 |     10 |00:00:00.01 |       9 |       |       |          |
|* 20 |      TABLE ACCESS BY INDEX ROWID | JOBS              |     10 |      1 |      0 |00:00:00.01 |      10 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ITEM_1"="L"."LOCATION_ID")
   3 - access(ROWID=ROWID)
   7 - filter((("E"."FIRST_NAME"='Steven' AND "E"."LAST_NAME"='King') OR "J"."JOB_TITLE"='President' OR  IS NOT NULL))
   8 - access("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID")
  12 - access("J"."JOB_ID"="E"."JOB_ID")
       filter("J"."JOB_ID"="E"."JOB_ID")
  18 - access("H"."EMPLOYEE_ID"=:B1)
  19 - access("J2"."JOB_ID"="H"."JOB_ID")
  20 - filter("J2"."JOB_TITLE"='President')

Query 3: Subquery Factor Union
This converts the 'OR' conditions into a union of three driving subqueries that return the matching department ids from a subquery factor (which could equally be an inline view), and then joins departments and locations.

QSD Subquery Factor Union

NCOUG-SQF

Query Subquery Factor Union

WITH driving_union AS (
SELECT e.department_id
  FROM employees e
 WHERE (e.first_name = 'Steven' AND e.last_name = 'King')
 UNION
SELECT e.department_id
  FROM jobs j
  JOIN employees e
    ON e.job_id        = j.job_id
 WHERE j.job_title     = 'President'
 UNION
SELECT e.department_id
  FROM jobs j
  JOIN job_history h
    ON j.job_id        = h.job_id
  JOIN employees e
    ON e.employee_id    = h.employee_id
 WHERE j.job_title     = 'President'
)
SELECT DISTINCT l.location_id, l.city
  FROM driving_union u
  JOIN departments d
    ON d.department_id 	= u.department_id
  JOIN locations l
    ON l.location_id 	= d.location_id

XPlan Subquery Factor Union

----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name             | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                  |      1 |        |      1 |00:00:00.01 |      29 |       |       |          |
|   1 |  HASH UNIQUE                         |                  |      1 |      8 |      1 |00:00:00.01 |      29 |  1156K|  1156K|  464K (0)|
|*  2 |   HASH JOIN                          |                  |      1 |      8 |      1 |00:00:00.01 |      29 |  1517K|  1517K|  366K (0)|
|*  3 |    HASH JOIN                         |                  |      1 |      8 |      1 |00:00:00.01 |      23 |  1517K|  1517K|  382K (0)|
|   4 |     VIEW                             |                  |      1 |      8 |      1 |00:00:00.01 |      17 |       |       |          |
|   5 |      SORT UNIQUE                     |                  |      1 |      8 |      1 |00:00:00.01 |      17 |  2048 |  2048 | 2048  (0)|
|   6 |       UNION-ALL                      |                  |      1 |        |      2 |00:00:00.01 |      17 |       |       |          |
|   7 |        TABLE ACCESS BY INDEX ROWID   | EMPLOYEES        |      1 |      1 |      1 |00:00:00.01 |       2 |       |       |          |
|*  8 |         INDEX RANGE SCAN             | EMP_NAME_IX      |      1 |      1 |      1 |00:00:00.01 |       1 |       |       |          |
|   9 |        NESTED LOOPS                  |                  |      1 |        |      1 |00:00:00.01 |       8 |       |       |          |
|  10 |         NESTED LOOPS                 |                  |      1 |      6 |      1 |00:00:00.01 |       7 |       |       |          |
|* 11 |          TABLE ACCESS FULL           | JOBS             |      1 |      1 |      1 |00:00:00.01 |       6 |       |       |          |
|* 12 |          INDEX RANGE SCAN            | EMP_JOB_IX       |      1 |      6 |      1 |00:00:00.01 |       1 |       |       |          |
|  13 |         TABLE ACCESS BY INDEX ROWID  | EMPLOYEES        |      1 |      6 |      1 |00:00:00.01 |       1 |       |       |          |
|  14 |        NESTED LOOPS                  |                  |      1 |        |      0 |00:00:00.01 |       7 |       |       |          |
|  15 |         NESTED LOOPS                 |                  |      1 |      1 |      0 |00:00:00.01 |       7 |       |       |          |
|  16 |          NESTED LOOPS                |                  |      1 |      1 |      0 |00:00:00.01 |       7 |       |       |          |
|* 17 |           TABLE ACCESS FULL          | JOBS             |      1 |      1 |      1 |00:00:00.01 |       6 |       |       |          |
|  18 |           TABLE ACCESS BY INDEX ROWID| JOB_HISTORY      |      1 |      1 |      0 |00:00:00.01 |       1 |       |       |          |
|* 19 |            INDEX RANGE SCAN          | JHIST_JOB_IX     |      1 |      1 |      0 |00:00:00.01 |       1 |       |       |          |
|* 20 |          INDEX UNIQUE SCAN           | EMP_EMP_ID_PK    |      0 |      1 |      0 |00:00:00.01 |       0 |       |       |          |
|  21 |         TABLE ACCESS BY INDEX ROWID  | EMPLOYEES        |      0 |      1 |      0 |00:00:00.01 |       0 |       |       |          |
|  22 |     VIEW                             | index$_join$_011 |      1 |     27 |     27 |00:00:00.01 |       6 |       |       |          |
|* 23 |      HASH JOIN                       |                  |      1 |        |     27 |00:00:00.01 |       6 |  1096K|  1096K| 1547K (0)|
|  24 |       INDEX FAST FULL SCAN           | DEPT_ID_PK       |      1 |     27 |     27 |00:00:00.01 |       3 |       |       |          |
|  25 |       INDEX FAST FULL SCAN           | DEPT_LOCATION_IX |      1 |     27 |     27 |00:00:00.01 |       3 |       |       |          |
|  26 |    VIEW                              | index$_join$_013 |      1 |     23 |     23 |00:00:00.01 |       6 |       |       |          |
|* 27 |     HASH JOIN                        |                  |      1 |        |     23 |00:00:00.01 |       6 |  1023K|  1023K| 1426K (0)|
|  28 |      INDEX FAST FULL SCAN            | LOC_CITY_IX      |      1 |     23 |     23 |00:00:00.01 |       3 |       |       |          |
|  29 |      INDEX FAST FULL SCAN            | LOC_ID_PK        |      1 |     23 |     23 |00:00:00.01 |       3 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("L"."LOCATION_ID"="D"."LOCATION_ID")
   3 - access("D"."DEPARTMENT_ID"="U"."DEPARTMENT_ID")
   8 - access("E"."LAST_NAME"='King' AND "E"."FIRST_NAME"='Steven')
  11 - filter("J"."JOB_TITLE"='President')
  12 - access("E"."JOB_ID"="J"."JOB_ID")
  17 - filter("J"."JOB_TITLE"='President')
  19 - access("J"."JOB_ID"="H"."JOB_ID")
  20 - access("E"."EMPLOYEE_ID"="H"."EMPLOYEE_ID")
  23 - access(ROWID=ROWID)
  27 - access(ROWID=ROWID)

Query 4: Outer Joins
This avoids existence subqueries using the idea that an outer join with a constraint that the joined record is not null, with a distinct qualifier to eliminate duplicates, can serve as a logical equivalent.

QSD Outer Joins

NCOUG-OJ

Query Outer Joins

SELECT DISTINCT l.location_id, l.city
  FROM employees e
  LEFT JOIN jobs j
    ON j.job_id        	= e.job_id
   AND j.job_title      = 'President'  
  LEFT JOIN job_history h
    ON h.employee_id    = e.employee_id
  LEFT JOIN jobs j2
    ON j2.job_id        = h.job_id
   AND j2.job_title     = 'President' 
  JOIN departments d
    ON d.department_id 	= e.department_id
  JOIN locations l
    ON l.location_id 	= d.location_id
 WHERE (e.first_name = 'Steven' AND e.last_name = 'King')
    OR j.job_id IS NOT NULL
    OR j2.job_id IS NOT NULL

XPlan Outer Joins

----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name              | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                   |      1 |        |      1 |00:00:00.01 |      36 |       |       |          |
|   1 |  HASH UNIQUE                  |                   |      1 |    106 |      1 |00:00:00.01 |      36 |  1156K|  1156K|  464K (0)|
|*  2 |   FILTER                      |                   |      1 |        |      1 |00:00:00.01 |      36 |       |       |          |
|*  3 |    HASH JOIN RIGHT OUTER      |                   |      1 |    106 |    109 |00:00:00.01 |      36 |  1269K|  1269K|  369K (0)|
|*  4 |     TABLE ACCESS FULL         | JOBS              |      1 |      1 |      1 |00:00:00.01 |       6 |       |       |          |
|*  5 |     HASH JOIN RIGHT OUTER     |                   |      1 |    106 |    109 |00:00:00.01 |      30 |  1134K|  1134K|  751K (0)|
|   6 |      VIEW                     | index$_join$_004  |      1 |     10 |     10 |00:00:00.01 |       6 |       |       |          |
|*  7 |       HASH JOIN               |                   |      1 |        |     10 |00:00:00.01 |       6 |  1096K|  1096K| 1331K (0)|
|   8 |        INDEX FAST FULL SCAN   | JHIST_EMPLOYEE_IX |      1 |     10 |     10 |00:00:00.01 |       3 |       |       |          |
|   9 |        INDEX FAST FULL SCAN   | JHIST_JOB_IX      |      1 |     10 |     10 |00:00:00.01 |       3 |       |       |          |
|* 10 |      HASH JOIN OUTER          |                   |      1 |    106 |    106 |00:00:00.01 |      24 |   858K|   858K| 1270K (0)|
|* 11 |       HASH JOIN               |                   |      1 |    106 |    106 |00:00:00.01 |      18 |  1063K|  1063K| 1252K (0)|
|* 12 |        HASH JOIN              |                   |      1 |     27 |     27 |00:00:00.01 |      12 |  1156K|  1156K| 1133K (0)|
|  13 |         VIEW                  | index$_join$_010  |      1 |     23 |     23 |00:00:00.01 |       6 |       |       |          |
|* 14 |          HASH JOIN            |                   |      1 |        |     23 |00:00:00.01 |       6 |  1023K|  1023K| 1450K (0)|
|  15 |           INDEX FAST FULL SCAN| LOC_CITY_IX       |      1 |     23 |     23 |00:00:00.01 |       3 |       |       |          |
|  16 |           INDEX FAST FULL SCAN| LOC_ID_PK         |      1 |     23 |     23 |00:00:00.01 |       3 |       |       |          |
|  17 |         VIEW                  | index$_join$_008  |      1 |     27 |     27 |00:00:00.01 |       6 |       |       |          |
|* 18 |          HASH JOIN            |                   |      1 |        |     27 |00:00:00.01 |       6 |  1096K|  1096K| 1547K (0)|
|  19 |           INDEX FAST FULL SCAN| DEPT_ID_PK        |      1 |     27 |     27 |00:00:00.01 |       3 |       |       |          |
|  20 |           INDEX FAST FULL SCAN| DEPT_LOCATION_IX  |      1 |     27 |     27 |00:00:00.01 |       3 |       |       |          |
|  21 |        TABLE ACCESS FULL      | EMPLOYEES         |      1 |    107 |    107 |00:00:00.01 |       6 |       |       |          |
|* 22 |       TABLE ACCESS FULL       | JOBS              |      1 |      1 |      1 |00:00:00.01 |       6 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter((("E"."FIRST_NAME"='Steven' AND "E"."LAST_NAME"='King') OR "J"."JOB_ID" IS NOT NULL OR "J2"."JOB_ID" IS NOT NULL))
   3 - access("J2"."JOB_ID"="H"."JOB_ID")
   4 - filter("J2"."JOB_TITLE"='President')
   5 - access("H"."EMPLOYEE_ID"="E"."EMPLOYEE_ID")
   7 - access(ROWID=ROWID)
  10 - access("J"."JOB_ID"="E"."JOB_ID")
  11 - access("D"."DEPARTMENT_ID"="E"."DEPARTMENT_ID")
  12 - access("L"."LOCATION_ID"="D"."LOCATION_ID")
  14 - access(ROWID=ROWID)
  18 - access(ROWID=ROWID)
  22 - filter("J"."JOB_TITLE"='President')

Query Summary Table

Here is a summary of some statistics on the queries, run on an Oracle 11.2 XE instance. Query lines depends on formatting of course.

Query Buffers Table Instances XPlan Steps Query Lines
Literal 279 10 27 29
NoCOUG Example 152 6 20 23
Subquery Factor Union 29 6 29 25
Outer Joins 36 6 22 17






Low-Hanging Fruit: The Classic Performance Tuning Anti-pattern

The July/August 2014 edition of Oracle Magazine contains an interesting PL/SQL performance tuning article by Steve Feuerstein, The Joy of Low-Hanging Fruit.

Low-Hanging Fruit

Low-Hanging Cherries

The article is a case study in which SF finds a "cursor FOR loop that contained two nonquery DML statements" and which he calls a classic anti-pattern. As he says, "the inserts and updates are changing the tables on a row-by-row basis, which maximizes the number of context switches". He also identifies an inefficiency in the code that is more problem-specific: An update process occurs within the loop that only needs to be performed at the end of the loop. (Although this is a generic kind of performance 'bug' I would resist the temptation to call it an anti-pattern, preferring to reserve the term for more deliberate strategies). The article goes on to replace the row-by-row DML using bulk processing, and to move the in-loop update outside the loop. SF claims a two to three-fold performance improvement in testing.

The performance improvement sounds good, but at the same time it occurs to me that the article itself could be seen as an illustration of the classic performance-tuning anti-pattern. This is when the tuner looks for things to optimise before he or she knows what is actually taking up the bulk of the time (and the article's title summarises it nicely - so I will use it to name the anti-pattern :) ). The main thrust of the article, as indicated in the subtitle, is the performance benefits of BULK COLLECT, but there is actually no indication that this is where the performance gains were made - it's possible, likely even, that all significant gains came from the other, more problem-specific, improvement.

The first paragraph of the final section of the article reads:

"It might be lots of fun to completely reorganize one's program in hopes of improving performance, but we can't just assume that the resulting code actually does run quickly."

This is exactly right, and is why the first step in any performance tuning process should be to identify the hot-spots. Tuning should then focus on those, ignoring areas where theoretical improvements will make no practical difference. If one might be permitted a second sylvan metaphor, it's important to perceive wood as well as trees.

William Blake illustration of Dante's Divine Comedy

The Wood of the Self-Murderers, by William Blake

There are many ways to identify the hot-spots, and I would recommend Oracle's PL/SQL profiling features, which I wrote about here, Notes on Profiling Oracle PL/SQL.

Notes on the Metaphors

Low-Hanging Fruit

For such a ubiquitous expression, this seems to be of surprisingly recent origin, according to: What Is the World’s Actual Lowest Hanging Fruit?.

The metaphorical usage doesn’t appear to have a very long history, though. According to Liberman, the Oxford English Dictionary’s first partial reference is from a 1968 Guardian article: “His rare images are picked aptly, easily, like low-hanging fruit.”

Not Seeing the Wood for the Trees

This expression is apparently much older, being traced back to (at least) 1546 here, Can't see the Economist for the Trees, where it is attributed to "The Proverbs of John Heywood" and the lines:

Plentie is no deyntie. ye see not your owne ease.
I see, ye can not see the wood for trees.

The article is ostensibly about an incorrect usage of the expression by The Economist, but is really more interesting than that, and includes equivalents in other languages. For example, Americans say:

Can't see the forest for the trees

while the French say:

L'arbre qui cache la forêt

.
The Wood of the Self-Murderers

This intriguing painting hangs at the Tate Gallery in London, and you can read more about it here, The Wood of the Self-Murderers: The Harpies and the Suicides

The work was completed between 1824 and 1827 and illustrates a passage from the Inferno canticle of the Divine Comedy by Dante Alighieri (1265–1321)

The passage concerned is:

Here the repellent harpies make their nests,
Who drove the Trojans from the Strophades
With dire announcements of the coming woe.
They have broad wings, a human neck and face,
Clawed feet and swollen, feathered bellies; they caw
Their lamentations in the eerie trees






Brendan's 2-Page Oracle Programming Standards

I recently had some discussions on PL/SQL coding standards centred around some documents I’d come across on the internet, as well as some internal company standards documents.

Steve Feuerstein has a useful page PL/SQL Standards linking to his own document and to two others:

Steven's Naming Conventions and Coding Standards, by Steve Feuerstein (12 page pdf).
PLSQL Standards Developed for the PLSQL Starter Framework-1.pdf, by Bill Coulam (34 page pdf).
Trivadis PL/SQL & SQL Coding Guidelines, by Roger Troller (50 page pdf).

Another interesting document is Naming and Coding Standards for SQL and PL/SQL by William Robertson (html), which emphasizes common sense in application of standards.

These are all valuable documents, but it was thought that many developers would not read and remain familiar with long documents of this kind, and a 2-page summary standards might be useful. Here is my attempt, as a Word document on Scribd:

Brendan's 2-Page Oracle Programming Standards by Brendan Furey







SQL for Continuum and Contiguity Grouping

A question was asked some time ago on Oracle's SQL&PL/SQL forum (Solution design and performance - SQL or PL/SQL?), the gist of which the poster specified thus:

I have a set of records which keep track of a status at a given date for a range of points on a given road. Basically, I need the current bits to "shine through".

The thread, despite the open-minded title, gives a nice illustration of the odd preference that people often have for complicated PL/SQL solutions to problems amenable to simpler SQL. I provided an SQL solution in that thread for this problem, and in this article I have taken a simpler, more general form of the problem defined there, and provide SQL solutions with diagrams illustrating how they work.

The class of problem can be characterised as follows:

  • Records have range fields, and we want to aggregate separately at each point within the 1-dimensional range domains (or 'continuum'), independently by grouping key
  • Within the groups we want to aggregate only the first or last records, ordering by some non-key fields

The first point above can be seen as equivalent to adding in to the main grouping key an implicit field specifying the domain leg, i.e. the region between the break points defined by the record ranges, and splitting the records by leg. We might term this form of aggregation continuum aggregation, or perhaps vertical aggregation if we view the range as distance, the internal ordering as by time, and visualise it graphically as in my diagrams below. Once this vertical aggregation is established, it is natural to think of adding a second aggregation, that might be termed contiguity aggregation, or with the same visualisation, horizontal aggregation:

  • Contiguous legs having the same values for the aggregated attributes will be grouped together

Here is an extract on grouping from Oracle® Database SQL Language Reference:

Oracle applies the aggregate functions to each group of rows and returns a single result row for each group.
...
The aggregate functions MIN, MAX, SUM, AVG, COUNT, VARIANCE, and STDDEV, when followed by the KEEP keyword, can be used in conjunction with the FIRST or LAST function to operate on a set of values from a set of rows that rank as the FIRST or LAST with respect to a given sorting specification. Refer to FIRST for more information.

A simple use-case for the KEEP form of grouping is to list contacts with their most recent telephone number when multiple numbers are stored in a separate table. A few years ago, I realised that in most examples that I see developers write their own code instead of using the built-in constructs, and I wrote an article comparing the performance and complexity of the various alternatives for this requirement (as well as alternatives for SQL pivoting), SQL Pivot and Prune Queries - Keeping an Eye on Performance.

Update, 20 November 2013: Added a Postgres version of the SQL solution.

Test Problem

Data Model

We take for our example problem a very simple model consisting of roads and road events where the events occur over a stretch of road at a given time, and event type forms the horizontal grouping attribute.

Road - ERD2

Test Data

Input data - Roads

ROAD_ID ROAD_DESC       START_POINT  END_POINT
------- --------------- ----------- ----------
      1 The Strand                0        200
      2 Piccadilly                0        100

Input data - road_events

ROAD_ID     REV_ID E_TYPE START_POINT  END_POINT E_DATE
------- ---------- ------ ----------- ---------- ---------
      1          1 OPEN             0         10 01-JAN-07
                 2 OPEN            20         50 01-JAN-08
                 3 CLOSED         130        160 01-FEB-08
                 4 CLOSED          55         85 05-JUN-08
                 5 OTHER           45        115 01-JAN-09
                 6 OPEN            60        100 12-FEB-11
                 7 CLOSED         115        145 12-FEB-12
      2          8 CLOSED          10         30 01-JAN-10
                 9 OPEN            40         50 01-JAN-11
                10 OPEN            50         70 01-JAN-12

10 rows selected.

The following diagrams display the data and solutions for our test problem.

Road - Road 1

Road - Road 2

SQL Solution for 'Point' Problem

To start with, here is a standard SQL solution for the 'zero-dimensional' problem, where the most recent record is required for each road:

Zero-Dimensional Solution

ROAD_DESC          R_START      R_END    E_START      E_END E_DATE    E_TYPE
--------------- ---------- ---------- ---------- ---------- --------- ------
Piccadilly               0        100         50         70 01-JAN-12 OPEN
The Strand               0        200        115        145 12-FEB-12 CLOSED

  1  SELECT r.road_desc, r.start_point r_start, r.end_point r_end,
  2         Max (e.start_point) KEEP (DENSE_RANK LAST ORDER BY e.event_date) e_start,
  3         Max (e.end_point) KEEP (DENSE_RANK LAST ORDER BY e.event_date) e_end,
  4         Max (e.event_date) KEEP (DENSE_RANK LAST ORDER BY e.event_date) e_date,
  5         Max (e.event_type) KEEP (DENSE_RANK LAST ORDER BY e.event_date) e_type
  6    FROM road_events e
  7    JOIN roads r
  8      ON r.id = e.road_id
  9   GROUP BY r.road_desc, r.start_point, r.end_point
 10*  ORDER BY 1, 2, 3

The SQL is more complicated for the continuum problem, and we provide two versions, that differ in the final stage, of horizontal grouping by contiguous event type. The first uses a differencing method popular in Oracle 11g and earlier versions for this common type of grouping problem; the second uses a new feature from Oracle 12c, row pattern matching. [I have also used another technique for this type of grouping, in an article on contiguity and other range-based grouping problems in June 2011, Forming Range-Based Break Groups With Advanced SQL.]

SQL Solution for 'Continuum' Problem, Contiguity Grouping by Differences

SQL (Contiguity by Differences)

WITH breaks AS  (
        SELECT road_id, start_point bp FROM road_events
         UNION
        SELECT road_id, end_point FROM road_events
         UNION
        SELECT id, start_point FROM roads
         UNION
        SELECT id, end_point FROM roads
), legs AS (
        SELECT road_id, bp leg_start, Lead (bp) OVER (PARTITION BY road_id ORDER BY bp) leg_end
          FROM breaks
), latest_events AS ( 
        SELECT l.road_id, l.leg_start, l.leg_end,
               Max (e.id) KEEP (DENSE_RANK LAST ORDER BY e.event_date) event_id,
               Nvl (Max (e.event_type) KEEP (DENSE_RANK LAST ORDER BY e.event_date), '(none)') event_type
          FROM legs l
          LEFT JOIN road_events e
            ON e.road_id = l.road_id
           AND e.start_point <= l.leg_start
	   AND e.end_point >= l.leg_end
         WHERE l.leg_end IS NOT NULL
         GROUP BY l.road_id, l.leg_start, l.leg_end
), latest_events_group AS ( 
        SELECT road_id,
               leg_start,
               leg_end,
               event_id,
               event_type,
               Dense_Rank () OVER (PARTITION BY road_id ORDER BY leg_start, leg_end) -
               Dense_Rank () OVER (PARTITION BY road_id, event_type ORDER BY leg_start, leg_end) group_no
          FROM latest_events
)
SELECT l.road_id, r.road_desc,
       Min (l.leg_start)        sec_start,
       Max (l.leg_end)          sec_end,
       l.event_type             e_type,
       l.group_no
  FROM latest_events_group l
  JOIN roads r
    ON r.id = l.road_id
 GROUP BY l.road_id,
        r.road_desc, 
        l.event_type,
        l.group_no
ORDER BY 1, 2, 3
/

ROAD_ID ROAD_DESC        SEC_START    SEC_END E_TYPE   GROUP_NO
------- --------------- ---------- ---------- ------ ----------
      1 The Strand               0         10 OPEN            0
                                10         20 (none)          1
                                20         45 OPEN            1
                                45         60 OTHER           3
                                60        100 OPEN            4
                               100        115 OTHER           5
                               115        160 CLOSED          9
                               160        200 (none)         11
      2 Piccadilly               0         10 (none)          0
                                10         30 CLOSED          1
                                30         40 (none)          1
                                40         70 OPEN            3
                                70        100 (none)          3

13 rows selected.

Query Structure Diagram (Contiguity by Differences)Road, V1.5 - QSD-DiffExecution Plan (Contiguity by Differences)

--------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name          | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |               |      1 |        |     13 |00:00:00.01 |      25 |       |       |          |
|   1 |  SORT ORDER BY                  |               |      1 |      2 |     13 |00:00:00.01 |      25 |  2048 |  2048 | 2048  (0)|
|   2 |   HASH GROUP BY                 |               |      1 |      2 |     13 |00:00:00.01 |      25 |   900K|   900K| 1343K (0)|
|   3 |    MERGE JOIN                   |               |      1 |     24 |     19 |00:00:00.01 |      25 |       |       |          |
|   4 |     TABLE ACCESS BY INDEX ROWID | ROADS         |      1 |      2 |      2 |00:00:00.01 |       2 |       |       |          |
|   5 |      INDEX FULL SCAN            | ROAD_PK       |      1 |      2 |      2 |00:00:00.01 |       1 |       |       |          |
|*  6 |     SORT JOIN                   |               |      2 |     24 |     19 |00:00:00.01 |      23 |  2048 |  2048 | 2048  (0)|
|   7 |      VIEW                       |               |      1 |     24 |     19 |00:00:00.01 |      23 |       |       |          |
|   8 |       WINDOW SORT               |               |      1 |     24 |     19 |00:00:00.01 |      23 |  2048 |  2048 | 2048  (0)|
|   9 |        WINDOW NOSORT            |               |      1 |     24 |     19 |00:00:00.01 |      23 | 73728 | 73728 |          |
|  10 |         SORT GROUP BY           |               |      1 |     24 |     19 |00:00:00.01 |      23 |  4096 |  4096 | 4096  (0)|
|* 11 |          HASH JOIN OUTER        |               |      1 |     24 |     25 |00:00:00.01 |      23 |  1696K|  1696K|  540K (0)|
|* 12 |           VIEW                  |               |      1 |     24 |     19 |00:00:00.01 |      16 |       |       |          |
|  13 |            WINDOW SORT          |               |      1 |     24 |     21 |00:00:00.01 |      16 |  2048 |  2048 | 2048  (0)|
|  14 |             VIEW                |               |      1 |     24 |     21 |00:00:00.01 |      16 |       |       |          |
|  15 |              SORT UNIQUE        |               |      1 |     24 |     21 |00:00:00.01 |      16 |  2048 |  2048 | 2048  (0)|
|  16 |               UNION-ALL         |               |      1 |        |     24 |00:00:00.01 |      16 |       |       |          |
|  17 |                INDEX FULL SCAN  | ROAD_EVENT_N1 |      1 |     10 |     10 |00:00:00.01 |       1 |       |       |          |
|  18 |                INDEX FULL SCAN  | ROAD_EVENT_N3 |      1 |     10 |     10 |00:00:00.01 |       1 |       |       |          |
|  19 |                TABLE ACCESS FULL| ROADS         |      1 |      2 |      2 |00:00:00.01 |       7 |       |       |          |
|  20 |                TABLE ACCESS FULL| ROADS         |      1 |      2 |      2 |00:00:00.01 |       7 |       |       |          |
|  21 |           TABLE ACCESS FULL     | ROAD_EVENTS   |      1 |     10 |     10 |00:00:00.01 |       7 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   6 - access("R"."ID"="L"."ROAD_ID")
       filter("R"."ID"="L"."ROAD_ID")
  11 - access("E"."ROAD_ID"="L"."ROAD_ID")
       filter(("E"."START_POINT"<="L"."LEG_START" AND "E"."END_POINT">="L"."LEG_END"))
  12 - filter("L"."LEG_END" IS NOT NULL)

SQL Solution for 'Continuum' Problem, Contiguity Grouping by 12c Row Pattern Matching

SQL (Contiguity by Pattern Matching)

WITH breaks AS  (
        SELECT road_id, start_point bp FROM road_events
         UNION
        SELECT road_id, end_point FROM road_events
         UNION
        SELECT id, start_point FROM roads
         UNION
        SELECT id, end_point FROM roads
), legs AS (
        SELECT road_id, bp leg_start, Lead (bp) OVER (PARTITION BY road_id ORDER BY bp) leg_end
          FROM breaks
), latest_events AS ( 
        SELECT l.road_id, r.road_desc, l.leg_start, l.leg_end,
               Max (e.id) KEEP (DENSE_RANK LAST ORDER BY e.event_date) event_id,
               Nvl (Max (e.event_type) KEEP (DENSE_RANK LAST ORDER BY e.event_date), '(none)') event_type
          FROM legs l
          JOIN roads r
            ON r.id = l.road_id
          LEFT JOIN road_events e
            ON e.road_id = l.road_id
           AND e.start_point <= l.leg_start
	   AND e.end_point >= l.leg_end
         WHERE l.leg_end IS NOT NULL
         GROUP BY l.road_id, r.road_desc, l.leg_start, l.leg_end
)
SELECT m.road_id, m.road_desc, m.sec_start, m.sec_end, m.event_type e_type
  FROM latest_events
 MATCH_RECOGNIZE (
   PARTITION BY road_id, road_desc
   ORDER BY leg_start, leg_end
   MEASURES FIRST (leg_start) sec_start,
            LAST (leg_end) sec_end,
            LAST (event_type) event_type
   PATTERN (strt sm*)
   DEFINE sm AS PREV(sm.event_type) = sm.event_type
 ) m
ORDER BY 1, 2, 3
/

ROAD_ID ROAD_DESC        SEC_START    SEC_END E_TYPE
------- --------------- ---------- ---------- ------
      1 The Strand               0         10 OPEN
                                10         20 (none)
                                20         45 OPEN
                                45         60 OTHER
                                60        100 OPEN
                               100        115 OTHER
                               115        160 CLOSED
                               160        200 (none)
      2 Piccadilly               0         10 (none)
                                10         30 CLOSED
                                30         40 (none)
                                40         70 OPEN
                                70        100 (none)

13 rows selected.

Query Structure Diagram (Contiguity by Pattern Matching)Road, V1.5 - QSD-MRExecution Plan (Contiguity by Pattern Matching)

-------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                        | Name          | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                                 |               |      1 |        |     13 |00:00:00.01 |      25 |       |       |          |
|   1 |  SORT ORDER BY                                   |               |      1 |      2 |     13 |00:00:00.01 |      25 |  2048 |  2048 | 2048  (0)|
|   2 |   VIEW                                           |               |      1 |      2 |     13 |00:00:00.01 |      25 |       |       |          |
|   3 |    MATCH RECOGNIZE SORT DETERMINISTIC FINITE AUTO|               |      1 |      2 |     13 |00:00:00.01 |      25 |  2048 |  2048 | 2048  (0)|
|   4 |     VIEW                                         |               |      1 |      2 |     19 |00:00:00.01 |      25 |       |       |          |
|   5 |      SORT GROUP BY                               |               |      1 |      2 |     19 |00:00:00.01 |      25 |  4096 |  4096 | 4096  (0)|
|*  6 |       HASH JOIN OUTER                            |               |      1 |     24 |     25 |00:00:00.01 |      25 |   987K|   987K|  525K (0)|
|   7 |        MERGE JOIN                                |               |      1 |     24 |     19 |00:00:00.01 |      18 |       |       |          |
|   8 |         TABLE ACCESS BY INDEX ROWID              | ROADS         |      1 |      2 |      2 |00:00:00.01 |       2 |       |       |          |
|   9 |          INDEX FULL SCAN                         | ROAD_PK       |      1 |      2 |      2 |00:00:00.01 |       1 |       |       |          |
|* 10 |         SORT JOIN                                |               |      2 |     24 |     19 |00:00:00.01 |      16 |  2048 |  2048 | 2048  (0)|
|* 11 |          VIEW                                    |               |      1 |     24 |     19 |00:00:00.01 |      16 |       |       |          |
|  12 |           WINDOW SORT                            |               |      1 |     24 |     21 |00:00:00.01 |      16 |  2048 |  2048 | 2048  (0)|
|  13 |            VIEW                                  |               |      1 |     24 |     21 |00:00:00.01 |      16 |       |       |          |
|  14 |             SORT UNIQUE                          |               |      1 |     24 |     21 |00:00:00.01 |      16 |  2048 |  2048 | 2048  (0)|
|  15 |              UNION-ALL                           |               |      1 |        |     24 |00:00:00.01 |      16 |       |       |          |
|  16 |               INDEX FULL SCAN                    | ROAD_EVENT_N1 |      1 |     10 |     10 |00:00:00.01 |       1 |       |       |          |
|  17 |               INDEX FULL SCAN                    | ROAD_EVENT_N3 |      1 |     10 |     10 |00:00:00.01 |       1 |       |       |          |
|  18 |               TABLE ACCESS FULL                  | ROADS         |      1 |      2 |      2 |00:00:00.01 |       7 |       |       |          |
|  19 |               TABLE ACCESS FULL                  | ROADS         |      1 |      2 |      2 |00:00:00.01 |       7 |       |       |          |
|  20 |        TABLE ACCESS FULL                         | ROAD_EVENTS   |      1 |     10 |     10 |00:00:00.01 |       7 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   6 - access("E"."ROAD_ID"="L"."ROAD_ID")
       filter(("E"."START_POINT"<="L"."LEG_START" AND "E"."END_POINT">="L"."LEG_END"))
  10 - access("R"."ID"="L"."ROAD_ID")
       filter("R"."ID"="L"."ROAD_ID")
  11 - filter("L"."LEG_END" IS NOT NULL)

Note
-----
   - this is an adaptive plan

SQL Solution for 'Continuum' Problem, Contiguity Grouping - Postgres

The Oracle v11 (and earlier versions) solution, with contiguity by differences, can be converted to work in Postgres, as shown below.

SQL (Continuum by Row_Number, Contiguity by Differences - Postgres)

WITH breaks AS  (
        SELECT road_id, start_point bp FROM road_events
         UNION
        SELECT road_id, end_point FROM road_events
         UNION
        SELECT id, start_point FROM roads
         UNION
        SELECT id, end_point FROM roads
), legs AS (
        SELECT road_id, bp leg_start, Lead (bp) OVER (PARTITION BY road_id ORDER BY bp) leg_end
          FROM breaks
), ranked_events AS ( 
        SELECT l.road_id, l.leg_start, l.leg_end,
               e.id event_id, Coalesce (e.event_type, '(none)') event_type,
               Row_Number() OVER (PARTITION BY l.road_id, l.leg_start ORDER BY e.event_date DESC) rnk
          FROM legs l
          LEFT JOIN road_events e
            ON e.road_id = l.road_id
           AND e.start_point <= l.leg_start            AND e.end_point >= l.leg_end
         WHERE l.leg_end IS NOT NULL
), latest_events_group AS ( 
        SELECT road_id,
               leg_start,
               leg_end,
               event_id,
               event_type,
               Dense_Rank () OVER (PARTITION BY road_id ORDER BY leg_start, leg_end) -
               Dense_Rank () OVER (PARTITION BY road_id, event_type ORDER BY leg_start, leg_end) group_no
          FROM ranked_events
         WHERE rnk = 1
)
SELECT l.road_id, r.road_desc,
       Min (l.leg_start)        sec_start,
       Max (l.leg_end)          sec_end,
       l.event_type             e_type,
       l.group_no
  FROM latest_events_group l
  JOIN roads r
    ON r.id = l.road_id
 GROUP BY l.road_id,
        r.road_desc, 
        l.event_type,
        l.group_no
ORDER BY 1, 2, 3;

Continuum/contiguity Solution with Row_Number...
 road_id | road_desc  | sec_start | sec_end | e_type | group_no 
---------+------------+-----------+---------+--------+----------
       1 | The Strand |         0 |      10 | OPEN   |        0
       1 | The Strand |        10 |      20 | (none) |        1
       1 | The Strand |        20 |      45 | OPEN   |        1
       1 | The Strand |        45 |      60 | OTHER  |        3
       1 | The Strand |        60 |     100 | OPEN   |        4
       1 | The Strand |       100 |     115 | OTHER  |        5
       1 | The Strand |       115 |     160 | CLOSED |        9
       1 | The Strand |       160 |     200 | (none) |       11
       2 | Piccadilly |         0 |      10 | (none) |        0
       2 | Piccadilly |        10 |      30 | CLOSED |        1
       2 | Piccadilly |        30 |      40 | (none) |        1
       2 | Piccadilly |        40 |      70 | OPEN   |        3
       2 | Piccadilly |        70 |     100 | (none) |        3
(13 rows)

SELECT Version();

Postgres version...
                           version                           
-------------------------------------------------------------
 PostgreSQL 9.3.1, compiled by Visual C++ build 1600, 64-bit
(1 row)

Notes on Conversion of SQL to Postgres

  • Postgres does not have an exact equivalent of Oracle's KEEP/FIRST grouping functionality, but it can be emulated via the analytic function Row_Number within a subquery
  • Coalesce, which is also available in Oracle, replaces Nvl, which does not exist in Postgres
  • Oracle's execution plans were obtained using DBMS_XPlan after running the queries, while the Postgres version was obtained by running the query prefaced by 'EXPLAIN ANALYZE '
  • There is no Postgres equivalent of Oracle v12's row pattern matching

Query Structure Diagram (Continuum by Row_Number, Contiguity by Differences - Postgres) Road, V1.5 - QSD-PGExecution Plan (Continuum by Row_Number, Contiguity by Differences - Postgres)

Prefacing the query with 'EXPLAIN ANALYZE ' gives:

Explaining Continuum/contiguity Solution with Row_Number...
                                                                 QUERY PLAN                                                                  
---------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=619.45..619.47 rows=8 width=270) (actual time=0.235..0.235 rows=13 loops=1)
   Sort Key: l.road_id, r.road_desc, (min(l.leg_start))
   Sort Method: quicksort  Memory: 26kB
   CTE breaks
     ->  HashAggregate  (cost=79.50..95.30 rows=1580 width=8) (actual time=0.013..0.020 rows=21 loops=1)
           ->  Append  (cost=0.00..71.60 rows=1580 width=8) (actual time=0.002..0.006 rows=24 loops=1)
                 ->  Seq Scan on road_events  (cost=0.00..14.80 rows=480 width=8) (actual time=0.001..0.003 rows=10 loops=1)
                 ->  Seq Scan on road_events road_events_1  (cost=0.00..14.80 rows=480 width=8) (actual time=0.000..0.002 rows=10 loops=1)
                 ->  Seq Scan on roads  (cost=0.00..13.10 rows=310 width=8) (actual time=0.000..0.001 rows=2 loops=1)
                 ->  Seq Scan on roads roads_1  (cost=0.00..13.10 rows=310 width=8) (actual time=0.000..0.000 rows=2 loops=1)
   CTE legs
     ->  WindowAgg  (cost=115.54..147.14 rows=1580 width=8) (actual time=0.030..0.041 rows=21 loops=1)
           ->  Sort  (cost=115.54..119.49 rows=1580 width=8) (actual time=0.028..0.029 rows=21 loops=1)
                 Sort Key: breaks.road_id, breaks.bp
                 Sort Method: quicksort  Memory: 25kB
                 ->  CTE Scan on breaks  (cost=0.00..31.60 rows=1580 width=8) (actual time=0.014..0.023 rows=21 loops=1)
   CTE ranked_events
     ->  WindowAgg  (cost=290.71..326.08 rows=1572 width=138) (actual time=0.089..0.104 rows=25 loops=1)
           ->  Sort  (cost=290.71..294.64 rows=1572 width=138) (actual time=0.088..0.089 rows=25 loops=1)
                 Sort Key: l_1.road_id, l_1.leg_start, e.event_date
                 Sort Method: quicksort  Memory: 26kB
                 ->  Hash Left Join  (cost=20.80..207.25 rows=1572 width=138) (actual time=0.044..0.079 rows=25 loops=1)
                       Hash Cond: (l_1.road_id = e.road_id)
                       Join Filter: ((e.start_point <= l_1.leg_start) AND (e.end_point >= l_1.leg_end))
                       Rows Removed by Join Filter: 89
                       ->  CTE Scan on legs l_1  (cost=0.00..31.60 rows=1572 width=12) (actual time=0.031..0.048 rows=19 loops=1)
                             Filter: (leg_end IS NOT NULL)
                             Rows Removed by Filter: 2
                       ->  Hash  (cost=14.80..14.80 rows=480 width=138) (actual time=0.005..0.005 rows=10 loops=1)
                             Buckets: 1024  Batches: 1  Memory Usage: 1kB
                             ->  Seq Scan on road_events e  (cost=0.00..14.80 rows=480 width=138) (actual time=0.001..0.002 rows=10 loops=1)
   CTE latest_events_group
     ->  WindowAgg  (cost=35.79..36.01 rows=8 width=48) (actual time=0.165..0.181 rows=19 loops=1)
           ->  Sort  (cost=35.79..35.81 rows=8 width=48) (actual time=0.164..0.165 rows=19 loops=1)
                 Sort Key: ranked_events.road_id, ranked_events.event_type, ranked_events.leg_start, ranked_events.leg_end
                 Sort Method: quicksort  Memory: 26kB
                 ->  WindowAgg  (cost=35.49..35.67 rows=8 width=48) (actual time=0.124..0.138 rows=19 loops=1)
                       ->  Sort  (cost=35.49..35.51 rows=8 width=48) (actual time=0.123..0.124 rows=19 loops=1)
                             Sort Key: ranked_events.road_id, ranked_events.leg_start, ranked_events.leg_end
                             Sort Method: quicksort  Memory: 26kB
                             ->  CTE Scan on ranked_events  (cost=0.00..35.37 rows=8 width=48) (actual time=0.090..0.117 rows=19 loops=1)
                                   Filter: (rnk = 1)
                                   Rows Removed by Filter: 6
   ->  HashAggregate  (cost=14.72..14.80 rows=8 width=270) (actual time=0.218..0.221 rows=13 loops=1)
         ->  Hash Join  (cost=0.26..14.60 rows=8 width=270) (actual time=0.203..0.207 rows=19 loops=1)
               Hash Cond: (r.id = l.road_id)
               ->  Seq Scan on roads r  (cost=0.00..13.10 rows=310 width=222) (actual time=0.002..0.002 rows=2 loops=1)
               ->  Hash  (cost=0.16..0.16 rows=8 width=52) (actual time=0.192..0.192 rows=19 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 2kB
                     ->  CTE Scan on latest_events_group l  (cost=0.00..0.16 rows=8 width=52) (actual time=0.167..0.190 rows=19 loops=1)
 Total runtime: 0.432 ms
(51 rows)

Code

Here is the code: Road-SQL






Messages from Outer Space

 

I began my career in IT in 1984, before the internet and even before widespread email availability. In those days one learnt about technology and other subjects largely through books and papers, training courses, and one's immediate colleagues. Since then the rise of the internet has transformed the learning process, vastly expanding the material available.

In its earlier years users could be roughly divided into a large group of consumers and a much smaller group of producers; later, and increasingly in recent years, the emergence of what is sometimes called 'Web 2' has enabled many consumers to become producers as well. As Wikipedia (itself a prime example of the phenomenon), Web 2.0, puts it:

"A Web 2.0 site may allow users to interact and collaborate with each other in a social media dialogue as creators of user-generated content in a virtual community..."

For IT developers and other technical people this is hugely important because it expands the pool of available expertise in one's field from a few colleagues and a relatively small group of publishers to potentially everyone working in the field. This means that good ideas and best practices quickly become available to all. Where in the past bad practices or antipatterns could become entrenched in a company through the influence of a small number of people sharing a bad idea, today we can check what the rest of the world thinks. It's a bit like the old scifi motif in which an advanced civilisation comes to share its ideas with Planet Earth :).

et

However, great though this is, it's necessary to make the effort to keep up with best practice in one's own field, and not all developers do that. In my main field, Oracle database development, the resources I find most useful are:

As well as keeping up with the general Oracle community, developers should ideally run their own blog to maximise learning.

The remainder of this article consists of useful links concerning best (or worst) practices that I deem significant in database development, largely in areas where people commonly get it wrong. This article will be an ongoing work in progress.

Database Design

Database Design and the Leaning Tower of Pisa

This article by Rodger Lepinsky notes that database table designs are usually done very quickly and tend not to be re-worked as the application is developed to avoid impacting developers. The problem is that this apparently justifiable reluctance to re-work often results in much additional code to work around inevitable deficiencies in the initial data model, and ultimately a much more complex system: It's short-sighted in other words.

Here's a nice cartoon illustrating a common database design antipattern taken to its logical conclusion :)

The EAV Data Model

Design Consistency

Rodger refers to database designs becoming frozen too early in the article mentioned in the previous section. I believe this is a design antipattern that occurs very widely in general. Often a new technology is applied for a new project, one that the developers may not know very well at first; similarly, custom frameworks are often developed, in Unix or Perl, or whatever, and the initial production versions are rarely perfect. However, once something works, the tendency is to freeze it, with all its limitations, for fear of unforeseen impacts, or even purely from the notion that consistency is more important than quality.

I think that this excessive emphasis on 'consistency' is one of the main reasons that so many over-complex, poor quality systems persist indefinitely. It might be better to focus on 'consistently' applying best practice as currently understood.

Data Access Layers

Considering SQL as a Service

The idea of 'modularising' SQL through data access layers is one that comes up frequently on Oracle forums, including the Ask Tom thread above, and is well known to be an antipattern. I deal with it in a wider context here:

SQL and Modularity: Patterns, Anti-Patterns and the Kitchen Sink

Here is another relevant AskTom thread:

Multi-Level Views

"I prefer a SINGLE LEVEL of views. Yes, there will be some repetition in their definition but I don't care about that. You can document that. You can maintain that"

ETL vs ELT

ETL - Using the wrong tool for the job

Companies often use a mix of ETL and ELT with Informatica and Oracle, among other tools. Here is Tom Kyte's pithy summation:

"elt = extract, load and then transform (forget any tools, if you have hundreds of gb's or tb's of data - you'll be doing this down to the wire, not with pretty pictures and push buttons)

etl = extract, transform and then load - without using the database to transform

elt = going light speed

etl = going by boat"

Object Relational Madness (ORM)

Arguments against using an ORM layer - an ammunition stockpile

Brilliant article with collation of links on the subject, including one that compares ORM to the Vietnam war (perhaps understating the case against) :)






SQL and Modularity: Patterns, Anti-Patterns and the Kitchen Sink

There's a lot of importance placed on code re-use in the database development world. In traditional procedural programming languages, such as C or Fortran, the value of modular programming and its application to promoting code re-use is well known and understood. When SQL enters the picture, however, the situation becomes less clear, and there is less consensus on how best to apply the traditional concept of modularity.

This article will consider the concept of modularity, and how it may best be applied to SQL, from the perspective of 'patterns' and 'anti-patterns'. Here is a definition of these terms from Wikipedia, Anti-pattern:

An anti-pattern (or antipattern) is a pattern used in social or business operations or software engineering that may be commonly used but is ineffective and/or counterproductive in practice.[1][2]

The term was coined in 1995 by Andrew Koenig,[3] inspired by Gang of Four's book Design Patterns, which developed the concept of design patterns in the software field. The term was widely popularized three years later by the book AntiPatterns, which extended the use of the term beyond the field of software design and into general social interaction. According to the authors of the latter, there must be at least two key elements present to formally distinguish an actual anti-pattern from a simple bad habit, bad practice, or bad idea:

  • Some repeated pattern of action, process or structure that initially appears to be beneficial, but ultimately produces more bad consequences than beneficial results, and
  • An alternative solution exists that is clearly documented, proven in actual practice and repeatable

Procedural Modularity

Modularity starts from the idea that a complex design can generally be broken down into a set of less complex component modules that is easier to work with. In programming terms, a long main program would be broken down into smaller subroutines, with a much shorter main program that calls the subroutines.

From this starting point emerges the possibility of code re-use, whereby the decomposition into modules aims at identifying common logic that can be placed in generic modules and called in multiple places. A simple example of this would be an error-logging module in PL/SQL that would write any Oracle errors to a table along with call stack information, that could be called wherever such errors need to be trapped. This might be termed an error-logging pattern, and it's clear that this kind of code re-use can lead to simpler and more maintainable systems.

SQL Modularity: Design Patterns

Transactional APIs

The concept of transactions is important for modular design within a database application.

The Oracle manual, Oracle Database Concepts, defines a transaction thus:
A transaction is a logical unit of work that contains one or more SQL statements. A transaction is an atomic unit. The effects of all the SQL statements in a transaction can be either all committed (applied to the database) or all rolled back (undone from the database).

When a transaction needs to be performed in more than one place, then the code can be placed in a PL/SQL module, sometimes called a 'transactional API'. This is really just the database-specific version of standard modularity, and is obviously a good design pattern to follow.

Data Access Layers

Transactional APIs are often used to form a Data Access Layer (DAL) for front end applications written in languages such as Java. Where the front end requires a record set from the database, the APIs may return a reference cursor, which is essentially a pointer to the data, and avoids the overhead of passing the whole data set at once. The data access layer pattern has a number of important advantages:

  • performance is enhanced through reduced network traffic between application server and database
  • SQL operates in an efficient set-based fashion for retrieving data in batches
  • the PL/SQL language, highly integrated with SQL, is specifically designed for database processing
  • storing the database processing code in database packages promotes modularity and code re-use

It is considered best practice to use data access layers even for clients, such as Oracle Forms, that have an embedded PL/SQL engine.

Of course, in order to achieve these benefits the data access layer has to be correctly written, avoiding the anti-pattern pitfalls discussed later. In particular, 'kitchen sink' style APIs that return far more data than required, to promote re-use, must be avoided; different client programs requiring different data should have separate APIs.

Views

Database views have been available in Oracle SQL from the earliest versions and can be used to avoid duplicating a complex SQL query that might be needed in multiple places.

This approach could be seens as a special case of transactional modularity, where a single query is the transaction, and might be regarded as a design pattern for re-use of SQL statements.

Views may form an alternative kind of data access layer, typically used by reporting tools such as Business Objects, and may also be used in conjumction with an API-based layer, which is a common approach in Oracle Forms applications.

Within-SQL Modularity

SQL is essentially a declarative, rather than procedural language for retrieving (and updating etc.) data from relational databases. The original idea was that the programmer specifies the tables and columns where the data are stored, as well as how the tables are related through key values, but does not specify algorithms for retrieving the data: The retrieval algorithms are performed by the SQL engine 'under the covers'. There might therefore seem to be little scope for modularity within a SQL select statement. However, this is not quite true for a couple of reasons: First, logical paths have to be specified between the tables, and the same path may need to be specified multiple times from different starting points; for example, the path to billing and shipping addresses on a sales order would typically involve the same sequence of steps from different id columns; second, as SQL has evolved, procedural capabilities have been added, such as analytic functions and recursion.

In Oracle in-line views were introduced in v7.2, and are in a sense a first step in modularising an SQL statement, followed in v9.2 by the 'WITH' clause for subquery factoring. Analytic functions were introduced in v8i.

Here is an example based on Oracle's HR demo schema. Suppose we want to a list of employees with their current and previous jobs (if any), the same for their manager, and a count of the number of subordinates they have. To get the previous jobs, we need to find the latest records in the job_history table for the employee and his manager separately. This can be done using subqueries, but that is inefficient and it is normally better to join to aggregation views that use the DENSE_RANK clause to allow the required previous records to be obtained in a single pass each. Similarly, the subordinate count could be done by a scalar subquery, but again performance would usually dictate the use of another aggregation view.

Here is a query using in-line views to achieve this:

SELECT emp.last_name || ', ' || emp.first_name name,
       job.job_title,
       job_p.job_title             job_title_prior,
       emp_m.last_name || ', ' || emp_m.first_name name_mgr,
       job_m.job_title             job_title_mgr,
       job_pm.job_title            job_title_mgr_prior,
       sub.n_sub
  FROM hr.employees                emp
  JOIN hr.jobs                     job
    ON job.job_id                  = emp.job_id
  LEFT JOIN (SELECT employee_id,
                    Max (job_id) KEEP (DENSE_RANK LAST ORDER BY end_date) job_id
               FROM hr.job_history
              GROUP BY employee_id
            )                      jhs
    ON jhs.employee_id             = emp.employee_id
  LEFT JOIN hr.jobs                job_p
    ON job_p.job_id                = jhs.job_id
  LEFT JOIN hr.employees           emp_m
    ON emp_m.employee_id           = emp.manager_id
  LEFT JOIN hr.jobs                job_m
    ON job_m.job_id                = emp_m.job_id
  LEFT JOIN (SELECT employee_id,
                    Max (job_id) KEEP (DENSE_RANK LAST ORDER BY end_date) job_id
               FROM hr.job_history
              GROUP BY employee_id
            )                      jhs_m
    ON jhs_m.employee_id           = emp.manager_id
  LEFT JOIN hr.jobs                job_pm
    ON job_pm.job_id               = jhs_m.job_id
  LEFT JOIN (SELECT manager_id,
                    Count(*)       n_sub
               FROM hr.employees
              GROUP BY manager_id
            )                      sub
    ON sub.manager_id              = emp.employee_id
 WHERE emp.department_id           = 30
 ORDER BY 1

Here is a query using subquery factors to achieve the same:

WITH jhs_f AS (
SELECT employee_id,
       Max (job_id) KEEP (DENSE_RANK LAST ORDER BY end_date) job_id
  FROM hr.job_history
 GROUP BY employee_id
), sub_f AS (
SELECT manager_id,
       Count(*)                    n_sub
  FROM hr.employees
 GROUP BY manager_id
)
SELECT emp.last_name || ', ' || emp.first_name name,
       job.job_title,
       job_p.job_title             job_title_prior,
       emp_m.last_name || ', ' || emp_m.first_name name_mgr,
       job_m.job_title             job_title_mgr,
       job_pm.job_title            job_title_mgr_prior,
       sub.n_sub
  FROM hr.employees                emp
  JOIN hr.jobs                     job
    ON job.job_id                  = emp.job_id
  LEFT JOIN jhs_f                  jhs
    ON jhs.employee_id             = emp.employee_id
  LEFT JOIN hr.jobs                job_p
    ON job_p.job_id                = jhs.job_id
  LEFT JOIN hr.employees           emp_m
    ON emp_m.employee_id           = emp.manager_id
  LEFT JOIN hr.jobs                job_m
    ON job_m.job_id                = emp_m.job_id
  LEFT JOIN jhs_f                  jhs_m
    ON jhs_m.employee_id           = emp.employee_id
  LEFT JOIN hr.jobs                job_pm
    ON job_pm.job_id               = jhs_m.job_id
  LEFT JOIN sub_f                  sub
    ON sub.manager_id              = emp.employee_id
 WHERE emp.department_id           = 30
 ORDER BY 1

The second query, although only a line shorter, could be said to be more modular in two ways:

  1. The more complex processing is placed at the beginning, prior to the main select, which now contains only simple joins. This might be said to parallel the procedural modularity practice of having a simple main program calling subroutines, and may help maintainability
  2. A single subquery factor replaces the two inline views for previous jobs, a more modular design, and one that may be more efficient for larger data sets since Oracle generally materialises subquery factors referenced multiple times

This approach might be regarded as a design pattern for modularity within individual SQL statements.

ANSI Join Syntax

The queries above are written using ANSI join syntax, introduced in v9. Oracle SQL originally used its own proprietary syntax, as shown below for the same query requirement:

WITH jhs_f AS (
SELECT employee_id,
       Max (job_id) KEEP (DENSE_RANK LAST ORDER BY end_date) job_id
  FROM hr.job_history
 GROUP BY employee_id
), sub_f AS (
SELECT manager_id,
       Count(*)                    n_sub
  FROM hr.employees
 GROUP BY manager_id
)
SELECT emp.last_name || ', ' || emp.first_name name,
       job.job_title,
       job_p.job_title             job_title_prior,
       emp_m.last_name || ', ' || emp_m.first_name name_mgr,
       job_m.job_title             job_title_mgr,
       job_pm.job_title            job_title_mgr_prior,
       sub.n_sub
  FROM hr.employees                emp,
       hr.jobs                     job,
       jhs_f                       jhs,
       hr.jobs                     job_p,
       hr.employees                emp_m,
       hr.jobs                     job_m,
       jhs_f                       jhs_m,
       hr.jobs                     job_pm,
       sub_f                       sub
 WHERE emp.department_id           = 30
   AND job.job_id                  = emp.job_id
   AND jhs.employee_id (+)         = emp.employee_id
   AND job_p.job_id (+)            = jhs.job_id
   AND emp_m.employee_id (+)       = emp.manager_id
   AND job_m.job_id (+)            = emp_m.job_id
   AND jhs_m.employee_id (+)       = emp.employee_id
   AND job_pm.job_id (+)           = jhs_m.job_id
   AND sub.manager_id (+)          = emp.employee_id
 ORDER BY 1

The tables are listed together, join clauses are in a single block not separated from constraints, and outer joins are specified using a (+) token against every column in the 'left' table. The outer join syntax leads to widespread bugs when developers miss the (+) from one of the columns, which silently converts the join to an inner join.

Some of the advantages of the newer syntax are:

  • Greater functionality is available, including full outer joining
  • Outer joining is much harder to get wrong
  • The syntax follows an ANSI standard
  • Locating the join conditions with the table being joined appears to be more modular and readable

ANSI join syntax might therefore be considered a good design pattern to follow.

SQL Modularity: Design Anti-patterns

Here is a thread from Tom Kyte's AskTom forum dealing with an SQL anti-pattern that is unfortunately common, and strongly opposed by Tom Kyte: Considering SQL as a Service. The idea behind the anti-pattern seems to be to avoid repeating even simple table joins in SQL by hiding them in a special type of data access layer designed to be called within individual SQL statements. This results in over-complex 'kitchen-sink' SQL within the layer itself and performance problems in the 'client' SQL. In addition, complex PL/SQL involving object types and arrays tends to be needed to glue it all together; the approach thus achieves the opposite of its intended purpose - simplification - in the manner of a classic anti-pattern.

We'll illustrate the anti-pattern by extending the HR example used above and working through an example. First we'll use the PL/SQL packaged procedure variant, then look at an older form based on views.

APIs as SQL Building Blocks Anti-pattern

Let's suppose that we start from the idea that we should centralise the SQL for employee information in a re-useable API. We could take the SQL above and add in department name, address and manager information to make it more general. We might think initially of making the API a function taking an employee id as input and returning a record. But what if we needed the information for a list of employees? It would be inefficient to call a function for every record in the list, which might lead us to think of making the function take a list as input and return a list of records. We might therefore define object and array types, and a function with the following signature:

FUNCTION Emp_Info_List (p_emp_id_list SYS.ODCINumberList) RETURN emp_info_list_type;

[All code and output referenced is attached to this article.]

Now let's consider a scenario in which we have to provide an API for a web front end following the design pattern of returning a reference cursor. The data required are the following details for all employees in a given department:

  • employee name
  • manager name
  • list of subordinates

The output for department 30 would be:

NAME                      NAME_MGR               NAME_SUB
------------------------- ---------------------- -------------------
Baida, Shelli             Raphaely, Den
Colmenares, Karen         Raphaely, Den
Himuro, Guy               Raphaely, Den
Khoo, Alexander           Raphaely, Den
Raphaely, Den             King, Steven           Baida, Shelli
Raphaely, Den             King, Steven           Colmenares, Karen
Raphaely, Den             King, Steven           Himuro, Guy
Raphaely, Den             King, Steven           Khoo, Alexander
Raphaely, Den             King, Steven           Tobias, Sigal
Tobias, Sigal             Raphaely, Den

10 rows selected.

Here is a possible procedure implementation:

PROCEDURE Get_Mgr_Subs_KS (p_dept_id PLS_INTEGER, x_mgr_sub_cur OUT SYS_REFCURSOR) IS
  l_emp_id_list SYS.ODCINumberList;
BEGIN

  SELECT employee_id
    BULK COLLECT INTO l_emp_id_list
    FROM hr.employees
   WHERE department_id = p_dept_id;

  OPEN x_mgr_sub_cur FOR
  SELECT t.name,
         t.name_mgr,
         e.last_name || ', ' || e.first_name
    FROM TABLE (KSink_Emp.Emp_Info_List (l_emp_id_list)) t
    LEFT JOIN hr.employees e
      ON e.manager_id = t.employee_id
   ORDER BY 1, 2, 3;

END Get_Mgr_Subs_KS;

The first step is to get the list of employees for the department, which we then pass into the API, wrapped in the TABLE key word, and join the employees table to get the subordinates. Three SQL select statements are executed. You might argue that I have over-complicated this by having the API take a list of employees rather than the department id, but remember that in this design anti-pattern the API can't be designed for one specific caller, and the list input is more general. It is intended to cater for all calls for employee information so in practice such compromises will happen frequently.

We can compare this to an alternative implementation in which we simply join the tables required:

PROCEDURE Get_Mgr_Subs_SQL (p_dept_id PLS_INTEGER, x_mgr_sub_cur OUT SYS_REFCURSOR) IS
BEGIN

  OPEN x_mgr_sub_cur FOR
  SELECT e.last_name || ', ' || e.first_name,
         m.last_name || ', ' || m.first_name,
         s.last_name || ', ' || s.first_name
    FROM hr.employees e
    LEFT JOIN hr.employees m
      ON m.employee_id = e.manager_id
    LEFT JOIN hr.employees s
      ON s.manager_id = e.employee_id
   WHERE e.department_id = p_dept_id
   ORDER BY 1, 2, 3;

END Get_Mgr_Subs_SQL;

This joins three tables in one statement while the earlier procedure effectively makes a join through PL/SQL using an array, which is arguably slightly more complicated. In any case the real problems become apparent when you compare the execution plans. I have written a test driver program that calls each of the APIs and loops over the returned cursor.

Taking the plan for the straight SQL implementation first:

-------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name              | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |                   |      1 |        |     10 |00:00:00.01 |      14 |       |       |          |
|   1 |  SORT ORDER BY                         |                   |      1 |     35 |     10 |00:00:00.01 |      14 |  2048 |  2048 | 2048  (0)|
|   2 |   NESTED LOOPS OUTER                   |                   |      1 |     35 |     10 |00:00:00.01 |      14 |       |       |          |
|*  3 |    HASH JOIN OUTER                     |                   |      1 |      6 |      6 |00:00:00.01 |      10 |  1281K|  1281K|  544K (0)|
|   4 |     TABLE ACCESS BY INDEX ROWID BATCHED| EMPLOYEES         |      1 |      6 |      6 |00:00:00.01 |       2 |       |       |          |
|*  5 |      INDEX RANGE SCAN                  | EMP_DEPARTMENT_IX |      1 |      6 |      6 |00:00:00.01 |       1 |       |       |          |
|   6 |     VIEW                               | index$_join$_002  |      1 |    107 |    107 |00:00:00.01 |       8 |       |       |          |
|*  7 |      HASH JOIN                         |                   |      1 |        |    107 |00:00:00.01 |       8 |  1245K|  1245K| 1439K (0)|
|   8 |       INDEX FAST FULL SCAN             | EMP_NAME_IX       |      1 |    107 |    107 |00:00:00.01 |       4 |       |       |          |
|   9 |       INDEX FAST FULL SCAN             | EMP_EMP_ID_PK     |      1 |    107 |    107 |00:00:00.01 |       4 |       |       |          |
|  10 |    TABLE ACCESS BY INDEX ROWID BATCHED | EMPLOYEES         |      6 |      6 |      5 |00:00:00.01 |       4 |       |       |          |
|* 11 |     INDEX RANGE SCAN                   | EMP_MANAGER_IX    |      6 |      6 |      5 |00:00:00.01 |       3 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("M"."EMPLOYEE_ID"="E"."MANAGER_ID")
   5 - access("E"."DEPARTMENT_ID"=:B1)
   7 - access(ROWID=ROWID)
  11 - access("S"."MANAGER_ID"="E"."EMPLOYEE_ID")

This is a relatively simple plan for the single SQL statement, with 14 buffers read.

For the anti-pattern version there are thee SQL select statements, but we'll ignore the plan for initial bulk collect SQL and consider the other two execution plans. First, the client API SQL:

---------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name             | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                  |      1 |        |     10 |00:00:00.14 |    1195 |     41 |      1 |       |       |          |
|   1 |  SORT ORDER BY                      |                  |      1 |  48554 |     10 |00:00:00.14 |    1195 |     41 |      1 |  2048 |  2048 | 2048  (0)|
|*  2 |   HASH JOIN RIGHT OUTER             |                  |      1 |  48554 |     10 |00:00:00.14 |    1195 |     41 |      1 |  1368K|  1368K| 1322K (0)|
|   3 |    VIEW                             | index$_join$_002 |      1 |    107 |    106 |00:00:00.02 |       8 |     12 |      0 |       |       |          |
|*  4 |     HASH JOIN                       |                  |      1 |        |    106 |00:00:00.02 |       8 |     12 |      0 |  1519K|  1519K| 1575K (0)|
|   5 |      INDEX FAST FULL SCAN           | EMP_MANAGER_IX   |      1 |    107 |    106 |00:00:00.01 |       4 |      6 |      0 |       |       |          |
|   6 |      INDEX FAST FULL SCAN           | EMP_NAME_IX      |      1 |    107 |    107 |00:00:00.01 |       4 |      6 |      0 |       |       |          |
|   7 |    COLLECTION ITERATOR PICKLER FETCH| EMP_INFO_LIST    |      1 |   8168 |      6 |00:00:00.13 |    1187 |     29 |      1 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("E"."MANAGER_ID"=SYS_OP_ATG(VALUE(KOKBF$),1,2,2))
   4 - access(ROWID=ROWID)

Note the extreme inaccuracy of the cardinality estimates at steps 1 and 2, which originate in the step 7 estimate of 8168, which is a database-level default for an array function call. This is exposing the general problem that joining to arrays prevents accurate cardinality estimates. A total of 1195 buffers were read.

Next, the plan for the inner API SQL:

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                       | Name                      | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                                |                           |      1 |        |      6 |00:00:00.01 |      94 |      1 |      1 |       |       |          |
|   1 |  TEMP TABLE TRANSFORMATION                      |                           |      1 |        |      6 |00:00:00.01 |      94 |      1 |      1 |       |       |          |
|   2 |   LOAD AS SELECT                                |                           |      1 |        |      0 |00:00:00.01 |       6 |      0 |      1 |  1036K|  1036K|          |
|   3 |    SORT GROUP BY NOSORT                         |                           |      1 |      7 |      7 |00:00:00.01 |       2 |      0 |      0 |       |       |          |
|   4 |     TABLE ACCESS BY INDEX ROWID                 | JOB_HISTORY               |      1 |     10 |     10 |00:00:00.01 |       2 |      0 |      0 |       |       |          |
|   5 |      INDEX FULL SCAN                            | JHIST_EMP_ID_ST_DATE_PK   |      1 |     10 |     10 |00:00:00.01 |       1 |      0 |      0 |       |       |          |
|*  6 |   HASH JOIN OUTER                               |                           |      1 |      6 |      6 |00:00:00.01 |      85 |      1 |      0 |   735K|   735K|  506K (0)|
|*  7 |    HASH JOIN OUTER                              |                           |      1 |      6 |      6 |00:00:00.01 |      78 |      1 |      0 |   736K|   736K|  891K (0)|
|*  8 |     HASH JOIN OUTER                             |                           |      1 |      6 |      6 |00:00:00.01 |      75 |      1 |      0 |   737K|   737K|  550K (0)|
|*  9 |      HASH JOIN OUTER                            |                           |      1 |      6 |      6 |00:00:00.01 |      68 |      1 |      0 |   739K|   739K|  883K (0)|
|* 10 |       HASH JOIN OUTER                           |                           |      1 |      6 |      6 |00:00:00.01 |      62 |      0 |      0 |   740K|   740K|  903K (0)|
|* 11 |        HASH JOIN OUTER                          |                           |      1 |      6 |      6 |00:00:00.01 |      55 |      0 |      0 |   746K|   746K|  541K (0)|
|* 12 |         HASH JOIN OUTER                         |                           |      1 |      6 |      6 |00:00:00.01 |      48 |      0 |      0 |   754K|   754K|  534K (0)|
|* 13 |          HASH JOIN OUTER                        |                           |      1 |      6 |      6 |00:00:00.01 |      41 |      0 |      0 |   766K|   766K|  418K (0)|
|* 14 |           HASH JOIN OUTER                       |                           |      1 |      6 |      6 |00:00:00.01 |      33 |      0 |      0 |   773K|   773K|  414K (0)|
|  15 |            NESTED LOOPS OUTER                   |                           |      1 |      6 |      6 |00:00:00.01 |      26 |      0 |      0 |       |       |          |
|* 16 |             HASH JOIN OUTER                     |                           |      1 |      6 |      6 |00:00:00.01 |      23 |      0 |      0 |   833K|   833K|  414K (0)|
|* 17 |              HASH JOIN OUTER                    |                           |      1 |      6 |      6 |00:00:00.01 |      16 |      0 |      0 |   876K|   876K|  415K (0)|
|* 18 |               HASH JOIN                         |                           |      1 |      6 |      6 |00:00:00.01 |       9 |      0 |      0 |   905K|   905K| 1259K (0)|
|  19 |                MERGE JOIN                       |                           |      1 |    107 |    107 |00:00:00.01 |       9 |      0 |      0 |       |       |          |
|  20 |                 TABLE ACCESS BY INDEX ROWID     | JOBS                      |      1 |     19 |     19 |00:00:00.01 |       2 |      0 |      0 |       |       |          |
|  21 |                  INDEX FULL SCAN                | JOB_ID_PK                 |      1 |     19 |     19 |00:00:00.01 |       1 |      0 |      0 |       |       |          |
|* 22 |                 SORT JOIN                       |                           |     19 |    107 |    107 |00:00:00.01 |       7 |      0 |      0 | 18432 | 18432 |16384  (0)|
|  23 |                  TABLE ACCESS FULL              | EMPLOYEES                 |      1 |    107 |    107 |00:00:00.01 |       7 |      0 |      0 |       |       |          |
|  24 |                COLLECTION ITERATOR PICKLER FETCH|                           |      1 |      6 |      6 |00:00:00.01 |       0 |      0 |      0 |       |       |          |
|  25 |               TABLE ACCESS FULL                 | DEPARTMENTS               |      1 |     27 |     27 |00:00:00.01 |       7 |      0 |      0 |       |       |          |
|  26 |              TABLE ACCESS FULL                  | LOCATIONS                 |      1 |     23 |     23 |00:00:00.01 |       7 |      0 |      0 |       |       |          |
|* 27 |             INDEX UNIQUE SCAN                   | COUNTRY_C_ID_PK           |      6 |      1 |      6 |00:00:00.01 |       3 |      0 |      0 |       |       |          |
|  28 |            TABLE ACCESS FULL                    | REGIONS                   |      1 |      4 |      4 |00:00:00.01 |       7 |      0 |      0 |       |       |          |
|  29 |           VIEW                                  | index$_join$_024          |      1 |    107 |    107 |00:00:00.01 |       8 |      0 |      0 |       |       |          |
|* 30 |            HASH JOIN                            |                           |      1 |        |    107 |00:00:00.01 |       8 |      0 |      0 |  1245K|  1245K| 1550K (0)|
|  31 |             INDEX FAST FULL SCAN                | EMP_NAME_IX               |      1 |    107 |    107 |00:00:00.01 |       4 |      0 |      0 |       |       |          |
|  32 |             INDEX FAST FULL SCAN                | EMP_EMP_ID_PK             |      1 |    107 |    107 |00:00:00.01 |       4 |      0 |      0 |       |       |          |
|  33 |          TABLE ACCESS FULL                      | EMPLOYEES                 |      1 |    107 |    107 |00:00:00.01 |       7 |      0 |      0 |       |       |          |
|  34 |         TABLE ACCESS FULL                       | JOBS                      |      1 |     19 |     19 |00:00:00.01 |       7 |      0 |      0 |       |       |          |
|  35 |        VIEW                                     |                           |      1 |     18 |     19 |00:00:00.01 |       7 |      0 |      0 |       |       |          |
|  36 |         HASH GROUP BY                           |                           |      1 |     18 |     19 |00:00:00.01 |       7 |      0 |      0 |  1558K|  1558K| 1185K (0)|
|  37 |          TABLE ACCESS FULL                      | EMPLOYEES                 |      1 |    107 |    107 |00:00:00.01 |       7 |      0 |      0 |       |       |          |
|  38 |       VIEW                                      |                           |      1 |      7 |      7 |00:00:00.01 |       6 |      1 |      0 |       |       |          |
|  39 |        TABLE ACCESS FULL                        | SYS_TEMP_0FD9D6604_5813E5 |      1 |      7 |      7 |00:00:00.01 |       6 |      1 |      0 |       |       |          |
|  40 |      TABLE ACCESS FULL                          | JOBS                      |      1 |     19 |     19 |00:00:00.01 |       7 |      0 |      0 |       |       |          |
|  41 |     VIEW                                        |                           |      1 |      7 |      7 |00:00:00.01 |       3 |      0 |      0 |       |       |          |
|  42 |      TABLE ACCESS FULL                          | SYS_TEMP_0FD9D6604_5813E5 |      1 |      7 |      7 |00:00:00.01 |       3 |      0 |      0 |       |       |          |
|  43 |    TABLE ACCESS FULL                            | JOBS                      |      1 |     19 |     19 |00:00:00.01 |       7 |      0 |      0 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   6 - access("JOB_P"."JOB_ID"="JHS"."JOB_ID")
   7 - access("JHS"."EMPLOYEE_ID"="EMP"."EMPLOYEE_ID")
   8 - access("JOB_MP"."JOB_ID"="JHS_M"."JOB_ID")
   9 - access("JHS_M"."EMPLOYEE_ID"="EMP"."EMPLOYEE_ID")
  10 - access("SUB"."MANAGER_ID"="EMP"."EMPLOYEE_ID")
  11 - access("JOB_M"."JOB_ID"="EMP_M"."JOB_ID")
  12 - access("EMP_M"."EMPLOYEE_ID"="EMP"."MANAGER_ID")
  13 - access("EMP_DM"."EMPLOYEE_ID"="DEP"."MANAGER_ID")
  14 - access("REG"."REGION_ID"="COU"."REGION_ID")
  16 - access("LOC"."LOCATION_ID"="DEP"."LOCATION_ID")
  17 - access("DEP"."DEPARTMENT_ID"="EMP"."DEPARTMENT_ID")
  18 - access("EMP"."EMPLOYEE_ID"=VALUE(KOKBF$))
  22 - access("JOB"."JOB_ID"="EMP"."JOB_ID")
       filter("JOB"."JOB_ID"="EMP"."JOB_ID")
  27 - access("COU"."COUNTRY_ID"="LOC"."COUNTRY_ID")
  30 - access(ROWID=ROWID)

This is pretty complex. We can't pretend to do a real performance analysis on such a small database (107 employees), but the potential for performance problems in real cases is clear. It's important to understand that any performance analysis on the client API has to take into account not just the client code, but the full complexity of the centralised SQL, so any apparent simplification from using it is to a great extent illusory.

Database Inter-Schema Data Access Layer Anti-pattern

There is a variant of the anti-pattern above in which data access layers are used to retrieve data across schema boundaries. This variant suffers from exactly the same problems as the first of course, and should equally be avoided.

Views as SQL Building Blocks Anti-Pattern

The same ideas as are behind the APIs as SQL Building Blocks Anti-Pattern can also be implemented through views, and in fact this variant form of the anti-pattern has been around longer I think. We can illustrate it on the same example, by creating a view instead of the central API cursor.

CREATE OR REPLACE VIEW emp_ks_v (
       employee_id,
       name,
       job_title,
       job_title_p,
       name_mgr,
       job_title_mgr,
       job_title_mgr_p,
       n_sub,
       department_id,
       department_name,
       name_d_mgr,
       street_address,
       country_name,
       region_name) AS
WITH jhs_f AS (
SELECT employee_id,
     Max (job_id) KEEP (DENSE_RANK LAST ORDER BY end_date) job_id
  FROM hr.job_history
 GROUP BY employee_id
), sub AS (
SELECT manager_id,
       Count(*)                  n_sub
  FROM hr.employees
 GROUP BY manager_id
)
SELECT emp.employee_id,
       emp.last_name || ', ' || emp.first_name,
       job.job_title,
       job_p.job_title,
       emp_m.last_name || ', ' || emp_m.first_name,
       job_m.job_title,
       job_mp.job_title,
       sub.n_sub,
       dep.department_id,
       dep.department_name,
       emp_dm.last_name || ', ' || emp_dm.first_name,
       loc.street_address,
       cou.country_name,
       reg.region_name
  FROM hr.employees              emp
  JOIN hr.jobs                   job
    ON job.job_id                = emp.job_id
  LEFT JOIN jhs_f                jhs
    ON jhs.employee_id           = emp.employee_id
  LEFT JOIN hr.jobs              job_p
    ON job_p.job_id              = jhs.job_id
  LEFT JOIN hr.employees         emp_m
    ON emp_m.employee_id         = emp.manager_id
  LEFT JOIN hr.jobs              job_m
    ON job_m.job_id              = emp_m.job_id
  LEFT JOIN jhs_f                jhs_m
    ON jhs_m.employee_id         = emp.employee_id
  LEFT JOIN hr.jobs              job_mp
    ON job_mp.job_id             = jhs_m.job_id
  LEFT JOIN sub
    ON sub.manager_id            = emp.employee_id
  LEFT JOIN hr.departments       dep
    ON dep.department_id         = emp.department_id
  LEFT JOIN hr.employees         emp_dm
    ON emp_dm.employee_id        = dep.manager_id
  LEFT JOIN hr.locations         loc
    ON loc.location_id           = dep.location_id
  LEFT JOIN hr.countries         cou
    ON cou.country_id            = loc.country_id
  LEFT JOIN hr.regions           reg
    ON reg.region_id             = cou.region_id

The view can then be called to get the employee details for example for a given department, thus:

SELECT t.name,
       t.name_mgr,
       CASE WHEN e.last_name IS NOT NULL THEN e.last_name || ', ' || e.first_name END name_sub
  FROM emp_ks_v t
  LEFT JOIN hr.employees e
    ON e.manager_id = t.employee_id
 WHERE t.department_id = 30
 ORDER BY 1, 2, 3

This is actually quite a lot better than the API-based approach as it's much simpler, avoiding the need for object arrays, and allowing use simply by joining. Let's look at the execution plan though (we'll just run the query rather than put it into a client API returning a reference cursor):

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name                      | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |                           |      1 |        |     10 |00:00:00.01 |      42 |      1 |      1 |       |       |          |
|   1 |  SORT ORDER BY                               |                           |      1 |     35 |     10 |00:00:00.01 |      42 |      1 |      1 |  2048 |  2048 | 2048  (0)|
|   2 |   NESTED LOOPS OUTER                         |                           |      1 |     35 |     10 |00:00:00.01 |      42 |      1 |      1 |       |       |          |
|   3 |    VIEW                                      | EMP_KS_V                  |      1 |      6 |      6 |00:00:00.01 |      38 |      1 |      1 |       |       |          |
|   4 |     TEMP TABLE TRANSFORMATION                |                           |      1 |        |      6 |00:00:00.01 |      38 |      1 |      1 |       |       |          |
|   5 |      LOAD AS SELECT                          |                           |      1 |        |      0 |00:00:00.01 |       6 |      0 |      1 |  1036K|  1036K|          |
|   6 |       SORT GROUP BY NOSORT                   |                           |      1 |      7 |      7 |00:00:00.01 |       2 |      0 |      0 |       |       |          |
|   7 |        TABLE ACCESS BY INDEX ROWID           | JOB_HISTORY               |      1 |     10 |     10 |00:00:00.01 |       2 |      0 |      0 |       |       |          |
|   8 |         INDEX FULL SCAN                      | JHIST_EMP_ID_ST_DATE_PK   |      1 |     10 |     10 |00:00:00.01 |       1 |      0 |      0 |       |       |          |
|*  9 |      HASH JOIN OUTER                         |                           |      1 |      6 |      6 |00:00:00.01 |      29 |      1 |      0 |   883K|   883K|  517K (0)|
|* 10 |       HASH JOIN OUTER                        |                           |      1 |      6 |      6 |00:00:00.01 |      21 |      1 |      0 |   890K|   890K|  857K (0)|
|* 11 |        HASH JOIN OUTER                       |                           |      1 |      6 |      6 |00:00:00.01 |      14 |      1 |      0 |   895K|   895K|  886K (0)|
|* 12 |         HASH JOIN OUTER                      |                           |      1 |      6 |      6 |00:00:00.01 |      11 |      1 |      0 |   905K|   905K|  893K (0)|
|  13 |          NESTED LOOPS                        |                           |      1 |      6 |      6 |00:00:00.01 |       5 |      0 |      0 |       |       |          |
|  14 |           NESTED LOOPS OUTER                 |                           |      1 |      1 |      1 |00:00:00.01 |       3 |      0 |      0 |       |       |          |
|  15 |            TABLE ACCESS BY INDEX ROWID       | DEPARTMENTS               |      1 |      1 |      1 |00:00:00.01 |       2 |      0 |      0 |       |       |          |
|* 16 |             INDEX UNIQUE SCAN                | DEPT_ID_PK                |      1 |      1 |      1 |00:00:00.01 |       1 |      0 |      0 |       |       |          |
|* 17 |            INDEX UNIQUE SCAN                 | LOC_ID_PK                 |      1 |      1 |      1 |00:00:00.01 |       1 |      0 |      0 |       |       |          |
|  18 |           TABLE ACCESS BY INDEX ROWID BATCHED| EMPLOYEES                 |      1 |      6 |      6 |00:00:00.01 |       2 |      0 |      0 |       |       |          |
|* 19 |            INDEX RANGE SCAN                  | EMP_DEPARTMENT_IX         |      1 |      6 |      6 |00:00:00.01 |       1 |      0 |      0 |       |       |          |
|  20 |          VIEW                                |                           |      1 |      7 |      7 |00:00:00.01 |       6 |      1 |      0 |       |       |          |
|  21 |           TABLE ACCESS FULL                  | SYS_TEMP_0FD9D6606_5813E5 |      1 |      7 |      7 |00:00:00.01 |       6 |      1 |      0 |       |       |          |
|  22 |         VIEW                                 |                           |      1 |      7 |      7 |00:00:00.01 |       3 |      0 |      0 |       |       |          |
|  23 |          TABLE ACCESS FULL                   | SYS_TEMP_0FD9D6606_5813E5 |      1 |      7 |      7 |00:00:00.01 |       3 |      0 |      0 |       |       |          |
|  24 |        VIEW                                  |                           |      1 |     18 |     19 |00:00:00.01 |       7 |      0 |      0 |       |       |          |
|  25 |         HASH GROUP BY                        |                           |      1 |     18 |     19 |00:00:00.01 |       7 |      0 |      0 |  1558K|  1558K| 1211K (0)|
|  26 |          TABLE ACCESS FULL                   | EMPLOYEES                 |      1 |    107 |    107 |00:00:00.01 |       7 |      0 |      0 |       |       |          |
|  27 |       VIEW                                   | index$_join$_013          |      1 |    107 |    107 |00:00:00.01 |       8 |      0 |      0 |       |       |          |
|* 28 |        HASH JOIN                             |                           |      1 |        |    107 |00:00:00.01 |       8 |      0 |      0 |  1245K|  1245K| 1410K (0)|
|  29 |         INDEX FAST FULL SCAN                 | EMP_NAME_IX               |      1 |    107 |    107 |00:00:00.01 |       4 |      0 |      0 |       |       |          |
|  30 |         INDEX FAST FULL SCAN                 | EMP_EMP_ID_PK             |      1 |    107 |    107 |00:00:00.01 |       4 |      0 |      0 |       |       |          |
|  31 |    TABLE ACCESS BY INDEX ROWID BATCHED       | EMPLOYEES                 |      6 |      6 |      5 |00:00:00.01 |       4 |      0 |      0 |       |       |          |
|* 32 |     INDEX RANGE SCAN                         | EMP_MANAGER_IX            |      6 |      6 |      5 |00:00:00.01 |       3 |      0 |      0 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   9 - access("EMP_M"."EMPLOYEE_ID"="EMP"."MANAGER_ID")
  10 - access("SUB"."MANAGER_ID"="EMP"."EMPLOYEE_ID")
  11 - access("JHS_M"."EMPLOYEE_ID"="EMP"."EMPLOYEE_ID")
  12 - access("JHS"."EMPLOYEE_ID"="EMP"."EMPLOYEE_ID")
  16 - access("DEP"."DEPARTMENT_ID"=30)
  17 - access("LOC"."LOCATION_ID"="DEP"."LOCATION_ID")
  19 - access("EMP"."DEPARTMENT_ID"=30)
  28 - access(ROWID=ROWID)
  32 - access("E"."MANAGER_ID"="T"."EMPLOYEE_ID")

There is only one SQL statement, and the plan is better too, with better cardinality estimates owing to the absence of array processing, and with 42 buffers read. It's still a bad idea though because the client caller would be executing far more complex SQL than is required and as before performance analysis on the client requires the full complexity of the 'centralised' SQL to be included. Using complex views as SQL building blocks is generally considered to have poor performance characteristics.

SQL Modularity: Other Design Options

Splitting Up Long SQL Statements

Oracle's Cost Based Optimiser (CBO) has been greatly enhanced since it's introduction in v7, but remains imperfect. The combinatorial nature of the problem that it tries to solve suggests that there will always be larger queries where it makes a bad choice of plan. In some cases splitting a large query into smaller ones and using temporary tables to join them can give better performance. This may arise from new indexing options for the CBO, or by dynamic sampling capabilities on the temporary tables, or just from the CBO algorithms happening to work better on the divided queries.

It's important to understand though that any such splitting should be done purely on performance grounds: the splitting increases the code complexity and breaking a sequence of declarative joins into several subsequences is not comparable with standard modularisation of programs into subprograms.

Long SQL statements are not necessarily problematic, but obviously should only be as long as necessary, and avoiding the design anti-patterns mentioned helps to ensure this.

Simple Views

Simple views, without joins, are often used in areas such as access control; for example Oracle Applications multi-org features (upto release 11) generally involve transactional tables being referenced by such simple views. These do not cause the performance problems seen with the complex building-block views anti-pattern.

Data Access Layers for Back-end Programs

It is quite possible to use a similar approach for database programs as for front-end programs in terms of a data access layer. One could adopt a standard that stand-alone database PL/SQL programs should access data through packaged APIs rather than directly. This obviously does not have the performance or language advantages of the design pattern in relation to Java front-ends for example, but it may in some circumstances be a preferred method of code organisation. It is therefore not an anti-pattern of the kind we have considered - as long as it is for stand-alone programs only.

Conclusions

The main aim of this article has been to distinguish between good approaches to modularity in SQL (patterns) and bad ones (anti-patterns) based on personal experience of seeing both types applied.

  • The Data Access Layer design pattern is an excellent approach for client applications developed in Java, .net etc. to access a database
  • Using Data Access Layers for internal access within a database is a classic anti-pattern leading to overcomplication and performance problems
  • A good design pattern used in an inappropriate context can become an anti-pattern

SQL Modularity Code






SQL for Length-Controlled List Aggregation

Recently an OTN poster asked how to return values from a column in multiple records into a single output record and column in SQL, Multiple Rows Into One Column Field. In the usual version of this list aggregation problem, one record is required for each distinct combination of grouping columns, with the aggregation fields delimited within the output field, and from Oracle v11.2 there is a built-in function for it, ListAgg. However, in this case the poster wanted a maximum of three values in the output field, with overflow records as necessary. Tom Kyte solved the problem in that thread essentially by adding in a calculated row number to the grouping columns, and concatenating the aggregation fields directly in the code.

Tim Hall has compiled a list of the main techniques available for the standard problem for different versions of the database, up to v11.2, here: String Aggregation Techniques

I thought it would be an interesting and useful variation on the problem to base record overflow on concatenated length rather than on number of values. This would provide an alternative to the CLOB-based variations for cases where the length exceeds 4KB in v11.2 or earlier (v12.1 raises the string limit to 32KB), and cases where the records just need to be of limited length for display or other purposes. Here's an example on another forum of a requirement to handle long strings: Ordering by list of strings in Oracle SQL without LISTAGG.

On the OTN thread, I provided an SQL solution for this variation, using recursion and the new v12.1 MATCH_RECOGNIZE syntax for row pattern matching, and an alternative using the MODEL clause. In this article I provide modified versions with explanations and execution plans.

SQL Analytics and Recursion

The problem variation as I have defined it is harder than it may at first appear, and in fact, can't be solved by SQL grouping and analytics alone: Recursion is required. The reason is that when considering whether a source record needs to start a new grouping record, or can be included on the prior record, the answer depends on the lengths of the fields of an unknown number of prior records. Analytic functions can only sum over a known number of prior records, and, although 'known' includes values that can be computed via a prior subquery, this is not possible here.

The approach we take involves two logical steps. In the first step, the records are processed in sequence within the partitions, and aggregate strings are accumulated for each record. When the string would exceed the maximum length a new aggregate is started and a flag is set.

Now, from the first step we have the input source number of records, with the desired aggregate strings being on the last records before any new aggregate, marked by the flag. We then just need to filter out the intermediate records.

Test Example

We will use Oracle's standard HR schema for test data, and will aggregate employee names by department with the name list field having maximum length 80. There are 107 employees, and the desired output is:

      DEPT ENAME_LIST
---------- --------------------------------------------------------------------------------
        10 Whalen, Jennifer
        20 Fay, Pat; Hartstein, Michael
        30 Baida, Shelli; Colmenares, Karen; Himuro, Guy; Khoo, Alexander; Raphaely, Den
        30 Tobias, Sigal
        40 Mavris, Susan
        50 Atkinson, Mozhe; Bell, Sarah; Bissot, Laura; Bull, Alexis; Cabrio, Anthony
        50 Chung, Kelly; Davies, Curtis; Dellinger, Julia; Dilly, Jennifer
        50 Everett, Britney; Feeney, Kevin; Fleaur, Jean; Fripp, Adam; Gates, Timothy
        50 Gee, Ki; Geoni, Girard; Grant, Douglas; Jones, Vance; Kaufling, Payam
        50 Ladwig, Renske; Landry, James; Mallin, Jason; Markle, Steven; Marlow, James
        50 Matos, Randall; McCain, Samuel; Mikkilineni, Irene; Mourgos, Kevin; Nayer, Julia
        50 OConnell, Donald; Olson, TJ; Patel, Joshua; Perkins, Randall; Philtanker, Hazel
        50 Rajs, Trenna; Rogers, Michael; Sarchand, Nandita; Seo, John; Stiles, Stephen
        50 Sullivan, Martha; Taylor, Winston; Vargas, Peter; Vollman, Shanta; Walsh, Alana
        50 Weiss, Matthew
        60 Austin, David; Ernst, Bruce; Hunold, Alexander; Lorentz, Diana; Pataballa, Valli
        70 Baer, Hermann
        80 Abel, Ellen; Ande, Sundar; Banda, Amit; Bates, Elizabeth; Bernstein, David
        80 Bloom, Harrison; Cambrault, Gerald; Cambrault, Nanette; Doran, Louise
        80 Errazuriz, Alberto; Fox, Tayler; Greene, Danielle; Hall, Peter; Hutton, Alyssa
        80 Johnson, Charles; King, Janette; Kumar, Sundita; Lee, David; Livingston, Jack
        80 Marvins, Mattea; McEwen, Allan; Olsen, Christopher; Ozer, Lisa; Partners, Karen
        80 Russell, John; Sewall, Sarath; Smith, Lindsey; Smith, William; Sully, Patrick
        80 Taylor, Jonathon; Tucker, Peter; Tuvault, Oliver; Vishney, Clara; Zlotkey, Eleni
        90 De Haan, Lex; King, Steven; Kochhar, Neena
       100 Chen, John; Faviet, Daniel; Greenberg, Nancy; Popp, Luis; Sciarra, Ismael
       100 Urman, Jose Manuel
       110 Gietz, William; Higgins, Shelley
           Grant, Kimberely

29 rows selected.

Recursive Subquery Factor Solutions

Recursive subquery factors are available from Oracle v11.2 up, and can be used to implement the required recursion. In this solution the flag denoting an overspill line has to be set initially on that overspill line, rather than on the preceding line, and on the last line in the partition, which are the lines we want to display. We therefore need another step.

Setting print flag via analytic function

In v11.2, an additional subquery can be added that uses the analytic function Lead to set a flag on the required lines. This works by looking at the previously set flag on the next record (if the record exists), and setting the desired line print flag accordingly on the current record. The query is:

WITH emps_ordered AS (
SELECT department_id dept,
       CAST (last_name || ', ' || first_name AS VARCHAR2(4000)) ename,
       Row_Number () OVER (PARTITION BY department_id ORDER BY last_name, first_name) rn
  FROM hr.employees
), rsf (dept, ename_list, new_line, rn) AS (
SELECT dept, ename, 1, rn
  FROM emps_ordered
 WHERE rn = 1
 UNION ALL
SELECT r.dept,
       CASE WHEN Length (r.ename_list || '; ' || e.ename) > :rec_len THEN e.ename ELSE r.ename_list || '; ' || e.ename END,
       CASE WHEN Length (r.ename_list || '; ' || e.ename) > :rec_len THEN 1 ELSE 0 END,
       r.rn + 1
  FROM rsf r
  JOIN emps_ordered e
    ON e.dept = r.dept
   AND e.rn = r.rn + 1
), leads_v AS (
SELECT dept, ename_list, 
       new_line, Lead (new_line, 1, 1) OVER (PARTITION BY dept ORDER BY rn) line_print, rn
  FROM rsf
)
SELECT *
  FROM leads_v
 WHERE line_print = 1
 ORDER BY dept, ename_list

How it works

  • emps_ordered subquery: Formats the name field and gets a row number within department ordering by the name
  • rsf recursive subquery: Anchor branch selects first employee in each department; recursive branch joins the next employee based on the row number, and accumulates the name list, resetting and flagging when length dictates a new overspill line
  • leads_v subquery: Use Lead analytic function to set the print flag
  • Main query: Selects rows where the print flag = 1

The output from the leads_v subquery, before filtering, illustrates how it works:

DEPT ENAME_LIST                                                                       NEW_LINE LINE_PRINT  RN
---- -------------------------------------------------------------------------------- -------- ---------- ---
  10 Whalen, Jennifer                                                                        1          1   1
  20 Fay, Pat                                                                                1          0   1
  20 Fay, Pat; Hartstein, Michael                                                            0          1   2
  30 Baida, Shelli                                                                           1          0   1
  30 Baida, Shelli; Colmenares, Karen                                                        0          0   2
  30 Baida, Shelli; Colmenares, Karen; Himuro, Guy                                           0          0   3
  30 Baida, Shelli; Colmenares, Karen; Himuro, Guy; Khoo, Alexander                          0          0   4
  30 Baida, Shelli; Colmenares, Karen; Himuro, Guy; Khoo, Alexander; Raphaely, Den           0          1   5
  30 Tobias, Sigal                                                                           1          1   6
  40 Mavris, Susan                                                                           1          1   1
  50 Atkinson, Mozhe                                                                         1          0   1
  50 Atkinson, Mozhe; Bell, Sarah                                                            0          0   2
  50 Atkinson, Mozhe; Bell, Sarah; Bissot, Laura                                             0          0   3
  50 Atkinson, Mozhe; Bell, Sarah; Bissot, Laura; Bull, Alexis                               0          0   4
  50 Atkinson, Mozhe; Bell, Sarah; Bissot, Laura; Bull, Alexis; Cabrio, Anthony              0          1   5
  50 Chung, Kelly                                                                            1          0   6
  50 Chung, Kelly; Davies, Curtis                                                            0          0   7
  50 Chung, Kelly; Davies, Curtis; Dellinger, Julia                                          0          0   8
  50 Chung, Kelly; Davies, Curtis; Dellinger, Julia; Dilly, Jennifer                         0          1   9
  50 Everett, Britney                                                                        1          0  10
  50 Everett, Britney; Feeney, Kevin                                                         0          0  11
  50 Everett, Britney; Feeney, Kevin; Fleaur, Jean                                           0          0  12
  50 Everett, Britney; Feeney, Kevin; Fleaur, Jean; Fripp, Adam                              0          0  13
  50 Everett, Britney; Feeney, Kevin; Fleaur, Jean; Fripp, Adam; Gates, Timothy              0          1  14
  50 Gee, Ki                                                                                 1          0  15
  50 Gee, Ki; Geoni, Girard                                                                  0          0  16
  50 Gee, Ki; Geoni, Girard; Grant, Douglas                                                  0          0  17
  50 Gee, Ki; Geoni, Girard; Grant, Douglas; Jones, Vance                                    0          0  18
  50 Gee, Ki; Geoni, Girard; Grant, Douglas; Jones, Vance; Kaufling, Payam                   0          1  19
  50 Ladwig, Renske                                                                          1          0  20
  50 Ladwig, Renske; Landry, James                                                           0          0  21
  50 Ladwig, Renske; Landry, James; Mallin, Jason                                            0          0  22
  50 Ladwig, Renske; Landry, James; Mallin, Jason; Markle, Steven                            0          0  23
  50 Ladwig, Renske; Landry, James; Mallin, Jason; Markle, Steven; Marlow, James             0          1  24
  50 Matos, Randall                                                                          1          0  25
  50 Matos, Randall; McCain, Samuel                                                          0          0  26
  50 Matos, Randall; McCain, Samuel; Mikkilineni, Irene                                      0          0  27
  50 Matos, Randall; McCain, Samuel; Mikkilineni, Irene; Mourgos, Kevin                      0          0  28
  50 Matos, Randall; McCain, Samuel; Mikkilineni, Irene; Mourgos, Kevin; Nayer, Julia        0          1  29
  50 OConnell, Donald                                                                        1          0  30
  50 OConnell, Donald; Olson, TJ                                                             0          0  31
  50 OConnell, Donald; Olson, TJ; Patel, Joshua                                              0          0  32
  50 OConnell, Donald; Olson, TJ; Patel, Joshua; Perkins, Randall                            0          0  33
  50 OConnell, Donald; Olson, TJ; Patel, Joshua; Perkins, Randall; Philtanker, Hazel         0          1  34
  50 Rajs, Trenna                                                                            1          0  35
  50 Rajs, Trenna; Rogers, Michael                                                           0          0  36
  50 Rajs, Trenna; Rogers, Michael; Sarchand, Nandita                                        0          0  37
  50 Rajs, Trenna; Rogers, Michael; Sarchand, Nandita; Seo, John                             0          0  38
  50 Rajs, Trenna; Rogers, Michael; Sarchand, Nandita; Seo, John; Stiles, Stephen            0          1  39
  50 Sullivan, Martha                                                                        1          0  40
  50 Sullivan, Martha; Taylor, Winston                                                       0          0  41
  50 Sullivan, Martha; Taylor, Winston; Vargas, Peter                                        0          0  42
  50 Sullivan, Martha; Taylor, Winston; Vargas, Peter; Vollman, Shanta                       0          0  43
  50 Sullivan, Martha; Taylor, Winston; Vargas, Peter; Vollman, Shanta; Walsh, Alana         0          1  44
  50 Weiss, Matthew                                                                          1          1  45
  60 Austin, David                                                                           1          0   1
  60 Austin, David; Ernst, Bruce                                                             0          0   2
  60 Austin, David; Ernst, Bruce; Hunold, Alexander                                          0          0   3
  60 Austin, David; Ernst, Bruce; Hunold, Alexander; Lorentz, Diana                          0          0   4
  60 Austin, David; Ernst, Bruce; Hunold, Alexander; Lorentz, Diana; Pataballa, Valli        0          1   5
  70 Baer, Hermann                                                                           1          1   1
  80 Abel, Ellen                                                                             1          0   1
  80 Abel, Ellen; Ande, Sundar                                                               0          0   2
  80 Abel, Ellen; Ande, Sundar; Banda, Amit                                                  0          0   3
  80 Abel, Ellen; Ande, Sundar; Banda, Amit; Bates, Elizabeth                                0          0   4
  80 Abel, Ellen; Ande, Sundar; Banda, Amit; Bates, Elizabeth; Bernstein, David              0          1   5
  80 Bloom, Harrison                                                                         1          0   6
  80 Bloom, Harrison; Cambrault, Gerald                                                      0          0   7
  80 Bloom, Harrison; Cambrault, Gerald; Cambrault, Nanette                                  0          0   8
  80 Bloom, Harrison; Cambrault, Gerald; Cambrault, Nanette; Doran, Louise                   0          1   9
  80 Errazuriz, Alberto                                                                      1          0  10
  80 Errazuriz, Alberto; Fox, Tayler                                                         0          0  11
  80 Errazuriz, Alberto; Fox, Tayler; Greene, Danielle                                       0          0  12
  80 Errazuriz, Alberto; Fox, Tayler; Greene, Danielle; Hall, Peter                          0          0  13
  80 Errazuriz, Alberto; Fox, Tayler; Greene, Danielle; Hall, Peter; Hutton, Alyssa          0          1  14
  80 Johnson, Charles                                                                        1          0  15
  80 Johnson, Charles; King, Janette                                                         0          0  16
  80 Johnson, Charles; King, Janette; Kumar, Sundita                                         0          0  17
  80 Johnson, Charles; King, Janette; Kumar, Sundita; Lee, David                             0          0  18
  80 Johnson, Charles; King, Janette; Kumar, Sundita; Lee, David; Livingston, Jack           0          1  19
  80 Marvins, Mattea                                                                         1          0  20
  80 Marvins, Mattea; McEwen, Allan                                                          0          0  21
  80 Marvins, Mattea; McEwen, Allan; Olsen, Christopher                                      0          0  22
  80 Marvins, Mattea; McEwen, Allan; Olsen, Christopher; Ozer, Lisa                          0          0  23
  80 Marvins, Mattea; McEwen, Allan; Olsen, Christopher; Ozer, Lisa; Partners, Karen         0          1  24
  80 Russell, John                                                                           1          0  25
  80 Russell, John; Sewall, Sarath                                                           0          0  26
  80 Russell, John; Sewall, Sarath; Smith, Lindsey                                           0          0  27
  80 Russell, John; Sewall, Sarath; Smith, Lindsey; Smith, William                           0          0  28
  80 Russell, John; Sewall, Sarath; Smith, Lindsey; Smith, William; Sully, Patrick           0          1  29
  80 Taylor, Jonathon                                                                        1          0  30
  80 Taylor, Jonathon; Tucker, Peter                                                         0          0  31
  80 Taylor, Jonathon; Tucker, Peter; Tuvault, Oliver                                        0          0  32
  80 Taylor, Jonathon; Tucker, Peter; Tuvault, Oliver; Vishney, Clara                        0          0  33
  80 Taylor, Jonathon; Tucker, Peter; Tuvault, Oliver; Vishney, Clara; Zlotkey, Eleni        0          1  34
  90 De Haan, Lex                                                                            1          0   1
  90 De Haan, Lex; King, Steven                                                              0          0   2
  90 De Haan, Lex; King, Steven; Kochhar, Neena                                              0          1   3
 100 Chen, John                                                                              1          0   1
 100 Chen, John; Faviet, Daniel                                                              0          0   2
 100 Chen, John; Faviet, Daniel; Greenberg, Nancy                                            0          0   3
 100 Chen, John; Faviet, Daniel; Greenberg, Nancy; Popp, Luis                                0          0   4
 100 Chen, John; Faviet, Daniel; Greenberg, Nancy; Popp, Luis; Sciarra, Ismael               0          1   5
 100 Urman, Jose Manuel                                                                      1          1   6
 110 Gietz, William                                                                          1          0   1
 110 Gietz, William; Higgins, Shelley                                                        0          1   2
     Grant, Kimberely                                                                        1          1   1

107 rows selected.

Execution Plan (with filtering)

-----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |           |      1 |        |     29 |00:00:00.01 |     322 |       |       |          |
|   1 |  SORT ORDER BY                               |           |      1 |    117 |     29 |00:00:00.01 |     322 |  9216 |  9216 | 8192  (0)|
|*  2 |   VIEW                                       |           |      1 |    117 |     29 |00:00:00.01 |     322 |       |       |          |
|   3 |    WINDOW SORT                               |           |      1 |    117 |    107 |00:00:00.01 |     322 | 13312 | 13312 |12288  (0)|
|   4 |     VIEW                                     |           |      1 |    117 |    107 |00:00:00.01 |     322 |       |       |          |
|   5 |      UNION ALL (RECURSIVE WITH) BREADTH FIRST|           |      1 |        |    107 |00:00:00.01 |     322 |  2048 |  2048 | 2048  (0)|
|*  6 |       VIEW                                   |           |      1 |    107 |     12 |00:00:00.01 |       7 |       |       |          |
|*  7 |        WINDOW SORT PUSHED RANK               |           |      1 |    107 |     12 |00:00:00.01 |       7 |  9216 |  9216 | 8192  (0)|
|   8 |         TABLE ACCESS FULL                    | EMPLOYEES |      1 |    107 |    107 |00:00:00.01 |       7 |       |       |          |
|*  9 |       HASH JOIN                              |           |     45 |     10 |     95 |00:00:00.01 |     315 |  1160K|  1160K| 1168K (0)|
|  10 |        RECURSIVE WITH PUMP                   |           |     45 |        |    107 |00:00:00.01 |       0 |       |       |          |
|  11 |        VIEW                                  |           |     45 |    107 |   4815 |00:00:00.01 |     315 |       |       |          |
|  12 |         WINDOW SORT                          |           |     45 |    107 |   4815 |00:00:00.01 |     315 | 18432 | 18432 |16384  (0)|
|  13 |          TABLE ACCESS FULL                   | EMPLOYEES |     45 |    107 |   4815 |00:00:00.01 |     315 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("LINE_PRINT"=1)
   6 - filter("RN"=1)
   7 - filter(ROW_NUMBER() OVER ( PARTITION BY "DEPARTMENT_ID" ORDER BY "LAST_NAME","FIRST_NAME")<=1)
   9 - access("E"."DEPT"="R"."DEPT" AND "E"."RN"="R"."RN"+1)

Note that the employees table is accessed 46 times, indicating an obvious performance issue.

Using MATCH_RECOGNIZE for filtering

In Oracle v12.1 row pattern matching was introduced, with the new keyword MATCH_RECOGNIZE, and this provides an alternative where the filtering does not require an additional subquery. The query is:

WITH emp_ordered AS (
SELECT department_id dept,
       Row_Number() OVER (PARTITION BY department_id ORDER BY last_name || ', ' || first_name) rn, 
       last_name || ', ' || first_name ename
  FROM hr.employees e
), emp_rsf (dept, rn, ename_list, new_line) AS (
SELECT dept, rn, ename, 1
  FROM emp_ordered
 WHERE rn = 1
UNION ALL
SELECT r.dept, 
       e.rn,
       CASE WHEN Length (r.ename_list || '; ' || e.ename) > :rec_len THEN e.ename
            ELSE r.ename_list || '; ' || e.ename END,
       CASE WHEN Length (r.ename_list || '; ' || e.ename) > :rec_len THEN 1
            ELSE 0 END
  FROM emp_rsf r
  JOIN emp_ordered e
    ON e.dept = r.dept
   AND e.rn = r.rn + 1
)
SELECT dept, ename_list, cls, mtc
  FROM emp_rsf
 MATCH_RECOGNIZE (
 PARTITION BY dept
 ORDER BY rn
 MEASURES ename_list AS ename_list,
          new_line AS new_line,
          Classifier() AS cls,
          Match_Number() AS mtc
 PATTERN ( strt sm* )
 DEFINE
   sm AS sm.new_line = 0
 )
 ORDER BY 1

How it works

  • emps_ordered and rsf subqueries as before
  • MATCH_RECOGNIZE clause: defines matching row sets that end in the lines to print
  • MEASURES section: includes two built-in functions for illustration purposes only: Classifier() = grouping, and Match_Number() = match number of the record
  • DEFINE section: defines a grouping, sm, based on the previously set flag, new_line, that applies if the flag = 0
  • PATTERN section: ( strt sm* ) includes an undefined grouping strt, and means to match an ordered set of records beginning with a record that does not fall into any defined grouping and continuing with zero or more records (but as many as possible) that are in the sm grouping

The output, with the extra built-in fields, helps to show how it works:

DEPT ENAME_LIST                                                                       CLS   MTC
---- -------------------------------------------------------------------------------- ---- ----
  10 Whalen, Jennifer                                                                 STRT    1
  20 Fay, Pat; Hartstein, Michael                                                     SM      1
  30 Baida, Shelli; Colmenares, Karen; Himuro, Guy; Khoo, Alexander; Raphaely, Den    SM      1
  30 Tobias, Sigal                                                                    STRT    2
  40 Mavris, Susan                                                                    STRT    1
  50 Atkinson, Mozhe; Bell, Sarah; Bissot, Laura; Bull, Alexis; Cabrio, Anthony       SM      1
  50 Chung, Kelly; Davies, Curtis; Dellinger, Julia; Dilly, Jennifer                  SM      2
  50 Everett, Britney; Feeney, Kevin; Fleaur, Jean; Fripp, Adam; Gates, Timothy       SM      3
  50 Gee, Ki; Geoni, Girard; Grant, Douglas; Jones, Vance; Kaufling, Payam            SM      4
  50 Ladwig, Renske; Landry, James; Mallin, Jason; Markle, Steven; Marlow, James      SM      5
  50 Matos, Randall; McCain, Samuel; Mikkilineni, Irene; Mourgos, Kevin; Nayer, Julia SM      6
  50 OConnell, Donald; Olson, TJ; Patel, Joshua; Perkins, Randall; Philtanker, Hazel  SM      7
  50 Rajs, Trenna; Rogers, Michael; Sarchand, Nandita; Seo, John; Stiles, Stephen     SM      8
  50 Sullivan, Martha; Taylor, Winston; Vargas, Peter; Vollman, Shanta; Walsh, Alana  SM      9
  50 Weiss, Matthew                                                                   STRT   10
  60 Austin, David; Ernst, Bruce; Hunold, Alexander; Lorentz, Diana; Pataballa, Valli SM      1
  70 Baer, Hermann                                                                    STRT    1
  80 Abel, Ellen; Ande, Sundar; Banda, Amit; Bates, Elizabeth; Bernstein, David       SM      1
  80 Bloom, Harrison; Cambrault, Gerald; Cambrault, Nanette; Doran, Louise            SM      2
  80 Errazuriz, Alberto; Fox, Tayler; Greene, Danielle; Hall, Peter; Hutton, Alyssa   SM      3
  80 Johnson, Charles; King, Janette; Kumar, Sundita; Lee, David; Livingston, Jack    SM      4
  80 Marvins, Mattea; McEwen, Allan; Olsen, Christopher; Ozer, Lisa; Partners, Karen  SM      5
  80 Russell, John; Sewall, Sarath; Smith, Lindsey; Smith, William; Sully, Patrick    SM      6
  80 Taylor, Jonathon; Tucker, Peter; Tuvault, Oliver; Vishney, Clara; Zlotkey, Eleni SM      7
  90 De Haan, Lex; King, Steven; Kochhar, Neena                                       SM      1
 100 Chen, John; Faviet, Daniel; Greenberg, Nancy; Popp, Luis; Sciarra, Ismael        SM      1
 100 Urman, Jose Manuel                                                               STRT    2
 110 Gietz, William; Higgins, Shelley                                                 SM      1
     Grant, Kimberely                                                                 STRT    1

29 rows selected.

Execution Plan

--------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                       | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                                |           |      1 |        |     29 |00:00:00.02 |     322 |       |       |          |
|   1 |  VIEW                                           |           |      1 |    117 |     29 |00:00:00.02 |     322 |       |       |          |
|   2 |   MATCH RECOGNIZE SORT DETERMINISTIC FINITE AUTO|           |      1 |    117 |     29 |00:00:00.02 |     322 | 13312 | 13312 |12288  (0)|
|   3 |    VIEW                                         |           |      1 |    117 |    107 |00:00:00.01 |     322 |       |       |          |
|   4 |     UNION ALL (RECURSIVE WITH) BREADTH FIRST    |           |      1 |        |    107 |00:00:00.01 |     322 |  2048 |  2048 | 2048  (0)|
|*  5 |      VIEW                                       |           |      1 |    107 |     12 |00:00:00.01 |       7 |       |       |          |
|*  6 |       WINDOW SORT PUSHED RANK                   |           |      1 |    107 |     12 |00:00:00.01 |       7 | 11264 | 11264 |10240  (0)|
|   7 |        TABLE ACCESS FULL                        | EMPLOYEES |      1 |    107 |    107 |00:00:00.01 |       7 |       |       |          |
|*  8 |      HASH JOIN                                  |           |     45 |     10 |     95 |00:00:00.02 |     315 |  1301K|  1301K| 1464K (0)|
|   9 |       VIEW                                      |           |     45 |    107 |   4815 |00:00:00.01 |     315 |       |       |          |
|  10 |        WINDOW SORT                              |           |     45 |    107 |   4815 |00:00:00.01 |     315 | 20480 | 20480 |18432  (0)|
|  11 |         TABLE ACCESS FULL                       | EMPLOYEES |     45 |    107 |   4815 |00:00:00.01 |     315 |       |       |          |
|  12 |       RECURSIVE WITH PUMP                       |           |     45 |        |    107 |00:00:00.01 |       0 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - filter("RN"=1)
   6 - filter(ROW_NUMBER() OVER ( PARTITION BY "DEPARTMENT_ID" ORDER BY "LAST_NAME"||', '||"FIRST_NAME")<=1)
   8 - access("E"."DEPT"="R"."DEPT" AND "E"."RN"="R"."RN"+1)

As in the earlier query, the employees table is accessed 46 times, indicating an obvious performance issue.

MODEL Solution

Oracle introduced the MODEL clause to its SQL syntax in v10. It has a reputation for often leading to SQL that is difficult to understand and sometimes inefficient, but it is well suited to this problem.

The query is:

WITH mod AS (
SELECT *
  FROM hr.employees
MODEL
  PARTITION BY (department_id)
  DIMENSION BY (Row_Number () OVER (PARTITION BY department_id ORDER BY last_name, first_name) rn)
  MEASURES (
    last_name || ', ' || first_name ename,
    CAST (NULL AS VARCHAR2(4000)) ename_list,
    0 line_print
  )
  RULES (
    ename_list[ANY] = CASE WHEN ename_list[CV()-1] IS NULL OR Length (ename_list[CV()-1] ||  '; ' || ename[CV()]) > :rec_len THEN ename[CV()]
                           ELSE ename_list[CV()-1] ||  '; ' || ename[CV()]
                      END,
    line_print[ANY] = CASE WHEN ename[CV()+1] IS NULL OR Length (ename_list[CV()] ||  '; ' || ename[CV()+1]) > :rec_len THEN 1 END
  )
)
SELECT department_id dept, ename_list
  FROM mod
 ORDER BY department_id, ename_list

How it works

  • mod subquery: the aggregation and line pirint flags are calculated in a single subquery using the MODEL clause
  • RULES section: first rule accumulates the aggregated lines, resetting when overspill occurs, with similar logic to that in the recursive subquery solution; the rule relies on the calculation occurring in the default order, by ascending dimension value; second rule relies on all the aggregates being calculated first by the first rule, then looks one row ahead within the partition to set the print flag
  • Main query: Selects rows where the print flag = 1

Execution Plan

------------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |           |      1 |        |     29 |00:00:00.01 |       7 |       |       |          |
|   1 |  SORT ORDER BY        |           |      1 |    107 |     29 |00:00:00.01 |       7 |  9216 |  9216 | 8192  (0)|
|*  2 |   VIEW                |           |      1 |    107 |     29 |00:00:00.01 |       7 |       |       |          |
|   3 |    SQL MODEL ORDERED  |           |      1 |    107 |    107 |00:00:00.01 |       7 |   962K|   905K| 1165K (0)|
|   4 |     WINDOW SORT       |           |      1 |    107 |    107 |00:00:00.01 |       7 | 18432 | 18432 |16384  (0)|
|   5 |      TABLE ACCESS FULL| EMPLOYEES |      1 |    107 |    107 |00:00:00.01 |       7 |       |       |          |
------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("LINE_PRINT"=1)

Note that the employees table is accessed only once, suggesting that this may be a mmore efficient query than the recursive subquery solutions.

Conclusions

  • We have presented three SQL solutions for length-controlled list aggregation
  • Two are based around the v11.2 feature recursive subquery factoring, with one also using the v12.1 feature, match_recognise
  • The third solution uses the v10.1 model clause feature, and this appears to be the simplest and fastest of the three, although no volume testing has been performed