Unix Snippets for Handling Duplicate Keys in Delimited Input Files

Recently I had a problem where input data files for an interface contained duplicate keys that caused the interface to fail. Sometimes the complete lines were duplicated, but in other cases only the key fields were duplicated with differing non-key attribute values. I wanted, firstly, to report on the problem lines, and, secondly, to clean up the files as far as possible. For lines that are fully duplicated, it's easy enough to deduplicate, and the following command does that, producing a sorted file with unique lines:

sort -u input.dat > output.dat

The unix uniq comand leaves the lines in a file in the original order, but only deduplicates adjacent sets of lines.

It's more difficult when keys are duplicated but non-key fields differ and we do not know which line to retain. In this case, we wanted to just delete those lines and report them to the source system for correction.

I wrote some snippets of unix code to report on duplicate-key lines and clean up the files, using unix utilities such as sort, awk, grep. This article lists the code snippets with outputs for a test data file.

Test Scenarios
The first test data file has two leading key fields, followed by four attribute fields, with pipe delimiters. There is one pair of complete duplicates, a set of three lines having the same key but varying attributes, and two unique lines. Because we will be using grep for pattern matching, an obvious potential bug would be to match strings that are not in the first two positions, so I have included one of the duplicate key pairs as fields 4 and 5 to trap that if it occurs.

test_dups_in_1_2.dat with the 2 leading fields key

k91|k92|a1|k41|k42|a
k11|k12|a1|k41|k42|b
k41|k42|a4x|k41|k42|c
k21|k22|a2|k41|k42|d
k41|k42|a4|k41|k42|c
k41|k42|a4|k41|k42|c
k11|k12|a1|k41|k42|b

Here are the functions applied to the initial test set:

  • List the distinct keys that have duplicates
  • List all lines for duplicate keys
  • Strip all lines with duplicate keys from the file, returning unsorted lines
  • Strip the inconsistent duplicates from the file, returning unique sorted lines

The solutions rely on sorting the lines, taking advantage of the fact that the key fields are leading. If the key fields are not leading a little more work is required, and I repeated the fourth function for this case, after swapping fields 2 and 3, so that then key fields are 1 and 3.

  • Strip the inconsistent duplicates from the file, returning unique sorted lines - key fields not leading

List the distinct keys that have duplicates
Unix Code

sort $INFILE1_2 | awk -F"|" ' $1$2 == last_key && !same_key {print $1"|"$2; same_key=1} $1$2 != last_key {same_key=0; last_key=$1$2}'

Output

k11|k12
k41|k42

List all lines for duplicate keys
Unix Code

sort $INFILE1_2 | awk -F"|" ' 
$1$2 == last_key && !same_key {print last_line; same_key=1} 
$1$2 == last_key {print $0; same_key=1} 
$1$2 != last_key {same_key=0; last_key=$1$2; last_line=$0}'

Output

k11|k12|a1|k41|k42|b
k11|k12|a1|k41|k42|b
k41|k42|a4|k41|k42|c
k41|k42|a4|k41|k42|c
k41|k42|a4x|k41|k42|c

I think this should work for any number of key fields as long as they are all at the start, if you replace $1$2 by the relevant field positions. If the key fields are not all at the start then you have to pre/post-process as I show later.

Strip all lines with duplicate keys from the file, returning unsorted lines
This code creates a copy of the input file with the lines for the duplicate keys stripped out, using the previous awk code to generate a list that we then loop over doing a "grep -v" on the copied file.

Unix Code

dup_list=`sort $INFILE1_2 | awk -F"|" '
$1$2 == last_key && !same_key {print $1"|"$2; same_key=1}
$1$2 != last_key {same_key=0; last_key=$1$2}'`
cp $INFILE1_2 $OUTFILE1_2; for i in $dup_list; do grep -v ^$i $OUTFILE1_2 > $WRKFILE1; mv $WRKFILE1 $OUTFILE1_2; done
cat $OUTFILE1_2

Output

k91|k92|a1|k41|k42|a
k21|k22|a2|k41|k42|d

Strip the inconsistent duplicates from the file, returning unique sorted lines
Here is a variation on the above, where we keep one line for duplicates where all the attributes are also duplicated, but drop the duplicates where the attributes vary and so we cannot clean them. We do this by apply the "-u" flag to the initial sort, and using that output also as the basis for the stripping part.

Unix Code

dup_list=`sort -u $INFILE1_2 | awk -F"|" '
$1$2 == last_key && !same_key {print $1"|"$2; same_key=1}
$1$2 != last_key {same_key=0; last_key=$1$2}'`
sort -u $INFILE1_2 > $OUTFILE1_2; for i in $dup_list; do grep -v ^$i $OUTFILE1_2 > $WRKFILE1; mv $WRKFILE1 $OUTFILE1_2; done
cat $OUTFILE1_2

Output

k11|k12|a1|k41|k42|b
k21|k22|a2|k41|k42|d
k91|k92|a1|k41|k42|a

Test Scenario with Non-leading Key Fields
A simple awk command creates a a copy of the original test data file, but with fields 2 and 3 swapped.

Unix Code

awk -F"|" '{print $1"|"$3"|"$2"|"$4"|"$5"|"$6}' $INFILE1_2 > $INFILE1_3

test_dups_in_1_3.dat with fields 1 and 3 key

k91|a1|k92|k41|k42|a
k11|a1|k12|k41|k42|b
k41|a4x|k42|k41|k42|c
k21|a2|k22|k41|k42|d
k41|a4|k42|k41|k42|c
k41|a4|k42|k41|k42|c
k11|a1|k12|k41|k42|b

Strip the inconsistent duplicates from the file, returning unique sorted lines - key fields not leading
This variation caters for non-leading key fields by first swapping the fields so that the key fields now lead, applying similar processing to the previous case, then swapping the fields back.

Unix Code

awk -F"|" ' {print $1"|"$3"|"$2"|"$4"|"$5"|"$6}' $INFILE1_3 | sort -u > $WRKFILE1
dup_list=`awk -F"|" ' $1$2 == last_key && !same_key {print $1"|"$2; same_key=1} $1$2 != last_key {same_key=0; last_key=$1$2}' $WRKFILE1`
for i in $dup_list; do grep -v ^$i $WRKFILE1 > $WRKFILE2; mv $WRKFILE2 $WRKFILE1; done
awk -F"|" '  {print $1"|"$3"|"$2"|"$4"|"$5"|"$6}' $WRKFILE1 > $OUTFILE1_3

Output

k11|a1|k12|k41|k42|b
k21|a2|k22|k41|k42|d
k91|a1|k92|k41|k42|a

Test script and output file attached.

Unix Snippets for Duplicate Keys






A Generic Unix Script for Uploading eBusiness Concurrent Programs

I have posted a couple of articles recently on XML Publisher report development within Oracle eBusiness applications (A Design Pattern for Oracle eBusiness Audit Trail Reports with XML Publisher and Design Patterns for Database Reports with XML Publisher and Email Bursting). These reports are of one of several types of batch program, or concurrent program that can be defined within Oracle eBusiness.

Oracle eBusiness uses a number of metadata tables to store information on the programs, and in release 11.5 Oracle introduced a Unix utility called FNDLOAD to download the metadata to formatted text files to allow them to be uploaded via the same utility into downstream environments. At that time batch reports were generally developed in the Oracle Reports tool and typically there might only be two text files (known as LDT files after their extension), for the program and for associated value sets, and maybe one for request groups (which control access to the reports). The executable report file, the RDF, would just be copied to the target environment server directory. I wrote a set of wrapper scripts for the utility to streamline its use and to deal with a number of issues, including handling of audit fields, with a structure consisting of a separate pair of scripts for download and upload of each type of LDT file. I published these on Scribd in July 2009.

In working more recently with Oracle's successor reporting tool, XML Publisher, I found that the number of objects involved in installation has increased substantially. As well as the LDT files there are also now XML configuration files and RTF (usually) templates, uploaded via a Java utility, XDOLoader. Installation also involves at least one PL/SQL package. For example the email version of my model reports (see the second link above) had 11 configuration files. For this reason, I decided to create a single script that would copy, upload and install all objects required for a concurrent program of any type, and I describe the script in this article.

Here is my original Scribd article, describing the issues mentioned, and with the original individual upload and download scripts.

Loading...

The new script is here XX_Install_XMLCP_ksh, and here is the MD120 from the model article that uses it.

Loading...

Directory Structure

The script operates on an input TAR file containing all the necessary files. The TAR file is called prog.tar after the program short name prog, and contains a directory also called prog with all the installation files. The installer uses relative paths and assumes that the TAR file, with the installer, is placed in a directory below the custom top directory, say $XX_TOP/rel. All loader files are copied to $XX_TOP/import and SQL files to $XX_TOP/install/sql.

After installation, a new directory will remain with all the installation files for reference, $XX_TOP/rel/prog.

Program Structure

The internal call structure of the Unix script is shown below.
Install CP - CSD

The script operates on an input TAR file containing all the necessary files.

After extracting the TAR file, the script has local subroutines that can be divided into four categories, as above:

  1. Preliminaries - parameter processing, file moving and validation
  2. FNDLOAD uploads - uploading the LDT files
  3. XDOLoader uploads - uploading the data and any bursting XML files, and all layout templates present
  4. SQL installations - installing any SQL script present, and the package spec and body files

The following table gives a few notes on the main program and preliminary subroutines.
Preliminaries Summary
Prelims Table
The following table gives a few notes on the upload and SQL subroutines.
Upload and SQL Summary
Install Uploads Table
The upload subroutines all have SQL queries for verification, and here is a sample log file from the script: XX_ERPXMLCP_EM_log

The remainder of the article lists the queries with diagrams and examples of output.

Upload Subroutines

upload_ag

Validation Query

SELECT app_g.application_short_name "App G", fag.group_name "Group",
      fag.description "Description",
      app_t.application_short_name "App T", ftb.table_name "Table",
      fcl.column_name "Column"
  FROM fnd_audit_groups fag
  JOIN fnd_application app_g
    ON app_g.application_id = fag.application_id
  JOIN fnd_audit_tables fat
    ON fat.audit_group_app_id = fag.application_id
   AND fat.audit_group_id = fag.audit_group_id
  JOIN fnd_application app_t
    ON app_t.application_id = fat.table_app_id
  JOIN fnd_tables ftb
    ON ftb.application_id = fat.table_app_id
   AND ftb.table_id = fat.table_id
  JOIN fnd_audit_columns fac
    ON fac.table_app_id = fat.table_app_id
   AND fac.table_id = fat.table_id
  JOIN fnd_columns fcl
    ON fcl.application_id = fac.table_app_id
   AND fcl.table_id = fac.table_id
   AND fcl.column_id = fac.column_id
 WHERE fag.last_update_date     = To_Date ('$sysdate', 'YYYY/MM/DD')
   AND fac.schema_id            = 900
ORDER BY app_g.application_short_name, fag.group_name, 
         app_t.application_short_name, ftb.table_name,
         fcl.column_name;

QSD

Install CP - AG - 1

Example Output
No example available here, but the headings are:
"App G", "Group", "Description", "App T", "Table", "Column"

upload_ms

Validation Query

SELECT mes.message_name "Name", mes.message_text "Text"
  FROM fnd_new_messages mes
 WHERE mes.last_update_date     = To_Date ('$sysdate', 'YYYY/MM/DD')
 ORDER BY 1;

QSD

Install CP - Message

Example Output

Name                     Text
--------------------- ---------------------------------------------
XX_ERPXMLCP_EM_NONXML Non-XML Concurrent Program &PROGAPP

upload_vs

Validation Query

SELECT fvs.flex_value_set_name "Value Set", Count(fvl.flex_value_set_id) "Values"
  FROM fnd_flex_value_sets fvs, fnd_flex_values fvl
 WHERE fvs.last_update_date     = To_Date ('$sysdate', 'YYYY/MM/DD')
   AND fvl.flex_value_set_id(+) = fvs.flex_value_set_id
 GROUP BY fvs.flex_value_set_name;

QSD

Install CP - VS

Example Output

Value Set             Values
--------------------- ------
XX_PROGS                   0
XX_APPNAME_ID              0

upload_cp

Validation Query

SELECT prg.user_concurrent_program_name || ': ' || prg.concurrent_program_name "Program", fcu.column_seq_num || ': ' || fcu.end_user_column_name "Parameter"
  FROM fnd_concurrent_programs_vl               prg
  LEFT JOIN fnd_descr_flex_column_usages      fcu
    ON fcu.descriptive_flexfield_name         = '\$SRS\$.' || prg.concurrent_program_name
   AND fcu.descriptive_flex_context_code      = 'Global Data Elements'
 WHERE prg.concurrent_program_name              = '$cp_name'
 ORDER BY 1, 2;

QSD

Install CP - CP

Example Output

Program                                    Parameter
------------------------------------------ ------------------------
XX Example XML CP (Email): XX_ERPXMLCP_EM  100: Cc Email
                                           10: Application
                                           20: From Program
                                           30: To Program
                                           40: From Date
                                           50: To Date
                                           60: From Parameter Count
                                           70: To Parameter Count
                                           80: Override Email
                                           90: From Email

upload_rga

Validation Query

SELECT rgp.request_group_name "Request Group",
       app.application_short_name "App"
  FROM fnd_concurrent_programs          cpr
  JOIN fnd_request_group_units          rgu
    ON rgu.unit_application_id          = cpr.application_id
   AND rgu.request_unit_id              = cpr.concurrent_program_id
  JOIN fnd_request_groups               rgp
    ON rgp.application_id               = rgu.application_id
   AND rgp.request_group_id             = rgu.request_group_id
  JOIN fnd_application                  app
    ON app.application_id               = rgp.application_id
 WHERE cpr.concurrent_program_name      = '$cp_name'
 ORDER BY 1;

QSD

Install CP - RGA

Example Output

Request Group                  App
------------------------------ ----------
System Administrator Reports   FND

upload_dd

Validation Query

SELECT xdd.data_source_code "Code", xtm.default_language "Lang", xtm.default_territory "Terr"
  FROM xdo_ds_definitions_b xdd
  LEFT JOIN xdo_templates_b xtm
    ON xtm.application_short_name       = xdd.application_short_name
   AND xtm.data_source_code             = xdd.data_source_code
 WHERE xdd.data_source_code             = '$cp_name'
 ORDER BY 1, 2, 3;

QSD

Install CP - DD

Example Output

Code                 Lang Terr
-------------------- ---- ----
XX_ERPXMLCP_EM       en   US

upload_all_temps

Validation Query

SELECT xdd.data_source_code "Code", 
        xlb_d.file_name "Data Template",
        xlb_b.file_name "Bursting File",
        xtm.template_code "Template",
        xlb.language "Lang", 
        xlb.territory "Terr",
        xlb.file_name || 
               CASE
               WHEN xlb.language = xtm.default_language AND
                    xlb.territory = xtm.default_territory
               THEN '*' END "File"
  FROM xdo_ds_definitions_b             xdd
  LEFT JOIN xdo_lobs                    xlb_d
    ON xlb_d.application_short_name     = xdd.application_short_name
   AND xlb_d.lob_code                   = xdd.data_source_code
   AND xlb_d.lob_type                   = 'DATA_TEMPLATE'
  LEFT JOIN xdo_lobs                    xlb_b
    ON xlb_b.application_short_name     = xdd.application_short_name
   AND xlb_b.lob_code                   = xdd.data_source_code
   AND xlb_b.lob_type                   = 'BURSTING_FILE'
  LEFT JOIN xdo_templates_b             xtm
    ON xtm.application_short_name       = xdd.application_short_name
   AND xtm.data_source_code             = xdd.data_source_code
  LEFT JOIN xdo_lobs                    xlb
    ON xlb.application_short_name       = xtm.application_short_name
   AND xlb.lob_code                     = xtm.template_code
   AND xlb.lob_type                     LIKE 'TEMPLATE%'
 WHERE xdd.data_source_code             = '$cp_name'
   AND xdd.application_short_name       = '$app'
 ORDER BY 1, 2, 3, 4;

QSD

Install CP - Template

Example Output

Code            Data Template        Bursting File          Template             Lang Terr File
--------------- -------------------- ---------------------- -------------------- ---- ---- -------------------------
XX_ERPXMLCP_EM  XX_ERPXMLCP_EM.xml   XX_ERPXMLCP_EM_BUR.xml XX_ERPXMLCP_EM       en   US   XX_ERPXMLCP_EM.xsl*
                                                                                           XX_ERPXMLCP_EM.rtf*
                                                            XX_ERPXMLCP_EM_XML   en   US   XX_ERPXMLCP_EM_XML.xsl*
                                                                                           XX_ERPXMLCP_EM_XML.rtf*






Design Patterns for Database Reports with XML Publisher and Email Bursting

In a recent article, A Design Pattern for Oracle eBusiness Audit Trail Reports with XML Publisher, I presented a report that I developed within Oracle eBusiness 12.1, using Oracle XML Publisher. The report was for displaying audit trail data for a particular table, and I proposed that it could be used as a design pattern for audit trail reporting on any eBusiness table. The article focussed on the rather complex SQL, with associated layout structures, needed to render the audit data into a readable format.

In this article, I present a pair of XML Publisher reports intended to serve as design patterns for more general reporting. The reports were again developed within the framework of Oracle's eBusiness embedded version of XML Publisher, of which there is also a stand-alone version,
Oracle Business Intelligence Publisher, which describes the product (rather ungrammatically) thus:

Oracle BI Publisher is the reporting solution to author, manage, and deliver all your reports and documents easier and faster than traditional reporting tools.

Reports Overview

The reports were developed specifically to serve as models for other reports, from which program constructs could be copied and pasted. For this reason I considered taking for the data sources universally available Oracle metadata such as all_tables or all_objects etc., but found problems with that idea, including:

  • They are not generally ordinary tables, and tend to produce very complex execution plans
  • The possible group structures I found were not quite general enough for my purposes

On the other hand, within Oracle eBusiness I had a number of queries available against the (FND application) metadata tables for concurrent programs that seemed to fit the purpose pretty well, and so decided to use these. As a compromise I created views on the tables with generic column names, so that the reports could be ported to other data sources relatively easily.

Concurrent Programs Data Model
In Oracle eBusiness applications concurrent (i.e. batch) programs have a logical metadata record for the program itself, and (potentially) a set of records for each of the following associated entities:

  • Parameter
  • Request group
  • XML layout template

All of these can have zero records, and for the third entity, only programs of specific type can have any records defined, a structure providing a reasonable degree of generality.

Email Bursting
Business application reports are often used to generate documents for sending to customers, suppliers etc., and increasingly companies prefer to email these out to save costs compared with mailing printed documents. In the past report developers had to code the emailing functionality themselves, usually by means of a co-ordinating program that would loop over the master records and call the reporting tool individually for each one, creating an attachment file that would then be emailed, perhaps by the Unix mailx program. As well as the development effort involved, this approach had the drawback that it scales badly as the number of emails rises owing to the individual calls to the reporting tool (such as Oracle Reports). A printed report will frequently create 10,000 records in little more time than for 1,000 records, whereas for the hand-coded emailing process it would likely take 10 times longer.

The bursting functionality in Oracle XML Publisher solves these problems by allowing the emailing processing to be specified by an XML configuration file, and by creating the files to be emailed in batch just as with printed reports. The data files are created initially in XML format from the data model, and are then merged with layout templates that are emailed by a second program.

The design pattern reports will consist of an email bursting version with a printed version, which would generally be required as a fallback for records with missing or invalid email addresses. We'll address the obvious security issue with automated emailing programs as part of our design pattern below.

Report Outputs

Examples of complete report outputs for the printed report in .pdf format are included in the attached zip file. Here three pages are given, two showing the listing for the example report programs themselves, and a third showing a non-XML program.

Example Report - XX Example XML CP
This is page 8 of the printed report run with the only the first parameter set, the application.

  • Only the parameters that were actually set are listed after the title
  • The templates region appears because it is an XML report
  • There is only one template for this printed report, with XSL conditionality to include or exclude regions or columns

XX_ERPXMLCP

Example Report - XX Example XML CP (Email)
This is page 9 of the printed report run with the only the first parameter set, the application.

  • Notice that the bursting file column appears because this report has the file attached, using XSL conditionality
  • This email version has two layout templates that are used conditionally for each record depending on whether it's of XML type or not
  • This record-level choice of template is implemented in the bursting XML file, and is not available in this way for the printed version

XX_ERPXMLCP_EM

Advanced Pricing Report - QP: Java Pricing Engine Warm Up
This is page 44 of the printed report run for the Advanced Pricing application.

  • It shows a non-XML program, and the templates region consequently does not appear
  • Notice that only the summary line appears for the parameters as there are no parameters defined for the program.

QPXJCCWU-2

XML Data Templates and XML Generators

XML Publisher reports require at least one layout template in one of several formats, and also require an executable program to generate the XML data file to merge with the template(s). Oracle supplies a Java executable to generate the data file from an XML data template containing the SQL queries, group structure etc. that the programmer attaches to the concurrent program. This is usually the best approach since it mimimizes the amount of programming required.

It is also possible to use Oracle Reports to generate XML. This can be a reasonable approach for converting legacy reports to XML Publisher to avail of some of the additional features, such as Excel output and email bursting. However, if the data model needs changing it is probably best to extract the SQL and create an XML data template from it.

It's a really bad idea to write a PL/SQL program to generate the XML data file, being a serious case of 'reinventing the wheel' - use Oracle's Java executable with XML data template instead!

Data Model: Groups, Queries, Parameters, Summaries

As described above, the example report has one master data group and three detail groups. As discussed in my last article Query Query Query, the minimum number of queries required for a group structure hierarchy is the number of groups minus the number of parent groups, which equals three here. The group structure with detail queries linked by bind variables could be depicted as below.

Group Structure Diagram
XML Publisher Model - Groups

A possible query structure for the three queries might then look like the following, where I have included summaries for two of the detail groups, and allowed for report constraints via lexical parameters, to be discussed later.

Query Structure Diagram
XML Publisher Model - Queries

Query Links and Bind Variables
It is possible to link detail queries to their master by XML query links, but according to the manual Oracle® XML Publisher Administration and Developer's Guide, Release 12:

XML Publisher tests have shown that using bind variables is more efficient than using the link tag.

Linking a detail query to its master by bind variables involves simply referencing the master variable link columns within the detail query preceded by a colon.

Constraints and Parameters
Reports often have constraints depending on input parameters, which may be mandatory or optional. Input parameters are defined in a parameters section within the data template.

Mandatory parameters can be referenced directly as bind variables within the queries (i.e. by preceding the parameter name with a colon).

Optional parameters should normally be referenced indirectly using additional lexical parameters. This allows the exact query required to be executed rather than a composite query with Nvls to represent all possible query combinations. The latter all-purpose queries tend to be more complex and to perform poorly. Lexical parameters are declared and processed in the associated database package, and referenced by preceding them with an ampersand (&).

Note that, confusingly, these lexical parameters are not the same as the lexical tags applicable only within eBusiness that refer to eBusiness flexfields. Like the corresponding user-exits in eBusiness Oracle Reports the lexical tags allow flexfields to be included without their structure being known to the report developer. This is necessary for the standard eBusiness reports but developers of custom reports normally know the structures, and so can avoid these tags altogether (at least for new reports).

Summaries
There are various ways of computing summaries in XML Publisher reports:

  • Within the SQL
  • within the XML group elements
  • within the layout template

SQL Summaries
Often the best way to do the summaries is in the SQL itself. This makes testing simpler because the queries can be run and debugged in SQL Developer or Toad, and also facilitates production support where developers can often run queries in a read-only schema, but can't change the production code.

In the simple case of summarising detail groups that are defined against a main query, the summaries can be done using analytic functions partitioning by the master key for first level details, and as desired for any subsequent levels. In my example reports, Example Line One is summarised in this way; a subquery factor was used but this is optional here.

Groups defined against additional queries cannot be summarised quite so easily in the SQL. I have summarised the Example Line Two group by adding a subquery factor in the main query purely to do the summaries: Because the line detail is not present in the query we can't do this by analytic functions, so a grouping summary in a separate subquery is necessary.

XML Group Summaries
Where possible SQL summaries via analytic functions appears best, but in other cases we may wish to consider the non-SQL alternatives. One of these is to define the summaries within the appropriate group as elements in the XML template setting the value attribute to the element in the lower level group to be summarised, and setting the function attribute as desired within a limited set of functions. I have not implemented this method in my examples.

XSL Layout Summaries
A further non-SQL alternative for summaries is to define them within the layout template using the XSL language, and I have implemented the summaries of Example Line Three using this method.

Database Package and Report Triggers

Each XML Publisher has a default package specified in the XML template, that handles parameter processing and implements any triggers required.

Package Spec
The spec declares both input parameters and lexical parameters, as well as the procedures that implement the report triggers (usually three).

Triggers

Before Report

  • This trigger is used to process the lexical parameters, constructing the appropriate where condition depending on which optional parameters have values
  • The example packages show how ranges of character, date and number are processed. I have found it more convenient to pass dates as string parameters.

After Report

  • This trigger can be used to write column headings to the log file for fields that are logged in the group filter

Group Filter

  • This trigger can be used to filter out records depending on some condition, by returning FALSE.
  • I use it in the examples to log selected fields for each record returned.
  • This logging is particularly important for email bursting reports as it enables checking that emails to be sent externally are appropriate before sending in a second step.

Printed Example Report Details

Report Parameters
The report parameters were designed to illustrate the implementation of character, date and number ranges, with an equality join parameter that fits well with the report content, all optional:

  • Application - the eBusiness application id
  • Program name From and To parameters - character range
  • Creation date From and To parameters - date range
  • Number of parameters From and To parameters - number range

Where there is an associated email version it may be worth having an additional Yes/No parameter determining whether to include records that have an email address defined.

Layout Template
There is a single layout template, of .rtf format.

Layout Template Structure
XML Publisher Model - Layout

Layout Template File

Loading...

Notes on Layout

  • Body and Margins
    • The page numbers are in the bottom margin (or footer)
    • The title, parameters and header fields are above the body tag, and repeat on each page of a given header record
  • Conditional Blocks and Columns
    • XSL if-blocks are used to conditionally print blocks such as input parameters, or detail tables depending on data values
    • XSL if-blocks with column qualifiers are used to conditionally print table columns: If there is no bursting file, the entire bursting file column will not appear
  • Sections and Page Breaks
    • The XSL for-each field for the header group uses the section qualifier, which, in conjunction with the page break field, specifies a new page after each header record except the last
    • By default, the above section qualifier would specify that page numbers reset with each header record, but this is overriden here by the initial-page-number field in the footer
  • XSL Summary
    • As discussed in the data model section, the line three summary is implemented in XSL - by the XSL count field

Email Bursting Example Report

Email Address Parameters
In addition to the printed report constraint parameters, three email address parameters have been added.

  • Override email address - setting a value here overrides the record level email address and is useful for testing
  • From and CC email addresses are parameters, which can be defaulted, for flexibility
  • The email address data source is hard-coded in this demo report: normally it would be something like a supplier email address at master record level

Two Step Bursting Process
The first step is to run the concurrent program for the report, which results in an XML file containing the report data to be created on the server. In the second step, a standard concurrent program 'XML Publisher Report Bursting Program' is run, passing the request id of the earlier custom report, in order to merge the XML data file with the layout template(s) and send the individual reports by email.

It's sometimes recommended to trigger the running of the second program within the custom report itself, but it's probably better not to do this, in order to allow validation of the reports and email addresses before sending them out.

Email Report Logging
The example report logs the email address and other fields in the concurrent program log file, including whether an override email address was specified, where the user can check them before executing the second step that sends out the emails.

Layout Template
There are two layout templates, of .rtf format, which are used conditionally depending on the type of record. The structure of the one used for XML type programs (shown below) has conditional header fields and the third lines block that are absent from the other (not shown).

Layout Template Structure
XML Publisher Model - Layout EM

Layout Template File

Loading...

Notes on Layout

  • The master group is excluded from the template, although its fields are present in the header. The XML Publisher bursting program performs the looping over this group

Email Bursting XML File

<xapi:requestset xmlns:xapi="http://xmlns.oracle.com/oxp/xapi" type="bursting">
<xapi:request select="/XX_ERPXMLCP_EM/LIST_G_HDR/G_HDR">
<xapi:delivery>
<xapi:email server="127.0.0.1" port="25" from="${FROM_EMAIL}" reply-to ="">
<xapi:message id="123" to="${EMAIL_ADDRESS}" cc="${CC_EMAIL}" 
attachment="true" subject="${SUB}">Dear Sir/Madam

Pleae find attached your concurrent program details. 

[Alternatively, leave blank and use output_type=html in document to have attachment go to body instead]</xapi:message>
</xapi:email>
</xapi:delivery>
<xapi:document output="CP_${PROG_NAME}.pdf" output-type="pdf" delivery="123">
<xapi:template type="rtf" location="XDO://CUSTOM.XX_ERPXMLCP_EM.en.US/?getSource=true" filter="//G_HDR[OUTPUT_TYPE!='XML']"></xapi:template>
<xapi:template type="rtf" location="XDO://CUSTOM.XX_ERPXMLCP_EM_XML.en.US/?getSource=true" filter="//G_HDR[OUTPUT_TYPE='XML']"></xapi:template>
</xapi:document>
</xapi:request>
</xapi:requestset>

Notes on Bursting File

  • The email server is specified in the bursting XML configuration file, along with the email addresses to be used, which can be variables from the selected records or hard-coded
  • The layout template is specified, and more than one can be included, as here, with filter conditions depending on selected data fields
  • Here the output is specified to be sent as a .pdf attachment; changing to output type html results in the the output appearing as body text
  • It can be useful to store some or all of the subject (or body) text on the database; the table fnd_messages is used to store the subject here, as records from the fnd tables can be installed automatically via fndload

Code to Download

The zip file Brendan_Model_XML_Reports contains a root folder and three subfolders, as follows:

  • Root - MD120 installation document for the email version. It references a generic Unix script that installs all objects, see A Generic Unix Script for Uploading Oracle eBusiness Concurrent Programs
  • Output - examples of log files for both printed and email versions and pdf outputs for printed version
  • XX_ERPXMLCP - complete code and metadata for the printed version
  • XX_ERPXMLCP_EM - complete code and metadata for the email version (except requires some printed version objects to be installed first)






Query Query Query

In my last post, A Design Pattern for Oracle eBusiness Audit Trail Reports with XML Publisher, I described a database report module developed in Oracle's XML Publisher tool. Of the report structure I wrote:

It has a master entity with two independent detail entities, and therefore requires a minimum of two queries.

But why does such a structure require two queries? And can we determine the minimum number of queries for reports in general? To start with, let's define a report in this context as being a hierarchy of record groups, where:

  • a record group is a set of records having the same columns with (possibly) differing values
  • each group is linked to a single record in its (single) parent group by values in the parent record, except the top level (or root) group

For example, in the earlier post the root group is a set of bank accounts, with the two detail (or child) groups being the set of owners of the bank account and the set of audit records for the bank account parent record. Corresponding to this group structure, each bank account record is the root of the data hierarchies, comprising two sets of records below the bank account record, one for the owners and one for the audit records linked to the root record by the bank account id.

A (relational) query always returns a flat record set, and it's this fact that determines the minimum number of queries required for a given group structure. A master-detail group structure can be flattened in the query by simply copying master fields on to the child record sets. The set cardinality is then the cardinality of the child set. The report designer uses their chosen reporting tool to specify display of the queried data in either flat, or in master-detail format.

In fact this approach works for any number of child levels, with the query cardinality being the number of bottom level descendants (using null records for potential parents that are in fact childless). It's clear though that the approach will not work for any second child at the same level because there would be two cardinalities and no meaningful single record for both child groups could be constructed within a flat query.

This reasoning leads to the conclusion that the minimum number of queries required in general is equal to the number of groups minus the number of parent groups.

Query Query Query - example

In the earlier post I also stated:

This minimum number of queries is usually the best choice...

There are two main reasons for this:

  • each child query fires for every record returned by its parent, with associated performance impact
  • maintenance tends to be more difficult with extra queries; this is much worse when the individual groups, which should almost always be implemented by a maximum of one query each, are split, and then need to be joined back together procedurally

On thinking about this, it occurred to me that if the group structure were defined in a metadata table we might be able to return minimum query structures using an SQL query. Just one, obviously :) . To save effort we could use Oracle's handy HR demo schema with the employee hierarchy representing groups.

The remainder of this article describes the query I came up with. As it's about hierarchies, recursion is the technique to use, and this is one of those cases where Oracle's old tree-walk syntax is too limited, so I am using the Oracle 11.2 recursive subquery factoring feature.

The query isn't going to be of practical value for report group structures since these are always quite small in size, but I expect there are different applications where this kind of Primogeniture Recursion would be useful.

Query Groups Query - Primogeniture Recursion

Query Structure Diagram
Query Query Query

SQL

WITH rsf (last_name, employee_id, lev, part_id, manager_id) AS (
SELECT last_name, employee_id, 0, employee_id, To_Number(NULL)
  FROM employees
 WHERE manager_id IS NULL
UNION ALL
SELECT e.last_name, e.employee_id, r.lev + 1, 
       CASE WHEN Row_Number() OVER (PARTITION BY r.employee_id ORDER BY e.last_name) = 1 THEN r.part_id ELSE e.employee_id END,
       e.manager_id
  FROM rsf r
  JOIN employees e
    ON e.manager_id = r.employee_id
)
SELECT part_id, LPad ('.', lev) || last_name last_name, employee_id, 
       Count(DISTINCT part_id) OVER () "#Partitions",
       Count(DISTINCT manager_id) OVER () "+ #Parents",
       Count(*) OVER () "= #Records"
  FROM rsf
 ORDER BY part_id, lev, last_name

Query Output

   PART_ID LAST_NAME            EMPLOYEE_ID #Partitions + #Parents = #Records
---------- -------------------- ----------- ----------- ---------- ----------
       100 King                         100          89         18        107
           .Cambrault                   148          89         18        107
            .Bates                      172          89         18        107
       101 .Kochhar                     101          89         18        107
            .Baer                       204          89         18        107
       102 .De Haan                     102          89         18        107
            .Hunold                     103          89         18        107
             .Austin                    105          89         18        107
       104   .Ernst                     104          89         18        107
       106   .Pataballa                 106          89         18        107
       107   .Lorentz                   107          89         18        107
       108  .Greenberg                  108          89         18        107
             .Chen                      110          89         18        107
       109   .Faviet                    109          89         18        107
       111   .Sciarra                   111          89         18        107
       112   .Urman                     112          89         18        107
       113   .Popp                      113          89         18        107
       114 .Raphaely                    114          89         18        107
            .Baida                      116          89         18        107
       115  .Khoo                       115          89         18        107
       117  .Tobias                     117          89         18        107
       118  .Himuro                     118          89         18        107
       119  .Colmenares                 119          89         18        107
       120 .Weiss                       120          89         18        107
            .Fleaur                     181          89         18        107
       121 .Fripp                       121          89         18        107
            .Atkinson                   130          89         18        107
       122 .Kaufling                    122          89         18        107
            .Chung                      188          89         18        107
       123 .Vollman                     123          89         18        107
            .Bell                       192          89         18        107
       124 .Mourgos                     124          89         18        107
            .Davies                     142          89         18        107
       125  .Nayer                      125          89         18        107
       126  .Mikkilineni                126          89         18        107
       127  .Landry                     127          89         18        107
       128  .Markle                     128          89         18        107
       129  .Bissot                     129          89         18        107
       131  .Marlow                     131          89         18        107
       132  .Olson                      132          89         18        107
       133  .Mallin                     133          89         18        107
       134  .Rogers                     134          89         18        107
       135  .Gee                        135          89         18        107
       136  .Philtanker                 136          89         18        107
       137  .Ladwig                     137          89         18        107
       138  .Stiles                     138          89         18        107
       139  .Seo                        139          89         18        107
       140  .Patel                      140          89         18        107
       141  .Rajs                       141          89         18        107
       143  .Matos                      143          89         18        107
       144  .Vargas                     144          89         18        107
       145 .Russell                     145          89         18        107
            .Bernstein                  151          89         18        107
       146 .Partners                    146          89         18        107
            .Doran                      160          89         18        107
       147 .Errazuriz                   147          89         18        107
            .Ande                       166          89         18        107
       149 .Zlotkey                     149          89         18        107
            .Abel                       174          89         18        107
       150  .Tucker                     150          89         18        107
       152  .Hall                       152          89         18        107
       153  .Olsen                      153          89         18        107
       154  .Cambrault                  154          89         18        107
       155  .Tuvault                    155          89         18        107
       156  .King                       156          89         18        107
       157  .Sully                      157          89         18        107
       158  .McEwen                     158          89         18        107
       159  .Smith                      159          89         18        107
       161  .Sewall                     161          89         18        107
       162  .Vishney                    162          89         18        107
       163  .Greene                     163          89         18        107
       164  .Marvins                    164          89         18        107
       165  .Lee                        165          89         18        107
       167  .Banda                      167          89         18        107
       168  .Ozer                       168          89         18        107
       169  .Bloom                      169          89         18        107
       170  .Fox                        170          89         18        107
       171  .Smith                      171          89         18        107
       173  .Kumar                      173          89         18        107
       175  .Hutton                     175          89         18        107
       176  .Taylor                     176          89         18        107
       177  .Livingston                 177          89         18        107
       178  .Grant                      178          89         18        107
       179  .Johnson                    179          89         18        107
       180  .Taylor                     180          89         18        107
       182  .Sullivan                   182          89         18        107
       183  .Geoni                      183          89         18        107
       184  .Sarchand                   184          89         18        107
       185  .Bull                       185          89         18        107
       186  .Dellinger                  186          89         18        107
       187  .Cabrio                     187          89         18        107
       189  .Dilly                      189          89         18        107
       190  .Gates                      190          89         18        107
       191  .Perkins                    191          89         18        107
       193  .Everett                    193          89         18        107
       194  .McCain                     194          89         18        107
       195  .Jones                      195          89         18        107
       196  .Walsh                      196          89         18        107
       197  .Feeney                     197          89         18        107
       198  .OConnell                   198          89         18        107
       199  .Grant                      199          89         18        107
       200  .Whalen                     200          89         18        107
       201 .Hartstein                   201          89         18        107
            .Fay                        202          89         18        107
       203  .Mavris                     203          89         18        107
       205  .Higgins                    205          89         18        107
             .Gietz                     206          89         18        107

107 rows selected.

 






A Design Pattern for Oracle eBusiness Audit Trail Reports with XML Publisher

Oracle eBusiness applications allow audit history records to be automatically maintained on database tables, as explained in the release 12 System Administrator's guide, Reporting On AuditTrail Data.

Oracle E-Business Suite provides an auditing mechanism based on Oracle database triggers. AuditTrail stores change information in a "shadow table" of the audited table. This mechanism saves audit data in an uncompressed but "sparse" format, and you enable auditing for particular tables and groups of tables ("audit groups").

Oracle provides an Audit Query Navigator page where it is possible to search for changes by primary key values. For reporting purposes, the manual says:

You should write audit reports as needed. Audit Trail provides the views of your shadow tables to make audit reporting easier; you can write your reports to use these views.

In fact the views are of little practical use, and it is quite hard to develop reports that are user-friendly, efficient and not over-complex, owing, amongst other things, to the "sparse" data format. However, once you have developed one you can use that as a design pattern and starting point for audit reporting on any of the eBusiness tables.

In this article I provide such a report for auditing external bank account changes on Oracle eBusiness 12.1. The report displays the current record, with lookup information, followed by a list of the changes within an input date range. Only records that have changes within that range are included, and for each change only the fields that were changed are listed. The lookup information includes a list of detail records, making the report overall pretty general in structure: It has a master entity with two independent detail entities, and therefore requires a minimum of two queries. This minimum number of queries is usually the best choice and is what I have implemented (it's fine to have an extra query for global data, but I don't have any here). The main query makes extensive use of analytic functions, case expressions and subquery factors to achieve the necessary data transformations as simply and efficiently as possible.

The report is implemented in XML (or BI) Publisher, which is the main batch reporting tool for Oracle eBusiness.

I start by showing sample output from the report, followed by the RTF template. The queries are then documented, starting with query structure diagrams with annotations explaining the logic. A link is included to a zip file with all the code and templates needed to install the report. Oracle provides extensive documentation on the setup of Auditing and developing in XML Publisher, so I will not cover this.

Report Layout
Example Output in Excel Format

  • There are three regions
    • Bank Account Current Record - the master record
    • Owners - first detail block, listing the owners of the bank account
    • Bank Account Changes - the second detail block, listing the audit history. Note that unchanged fields are omitted
  • Note that some audit fields are displayed directly, such as account number, while for others, such as branch number, the display value is on a referenced table

XML Publisher RTF Tempate

  • Note that each audit field has its own row in the table, but the if-block excludes it if both old and new values are null

Audit Query
Query Structure Diagram
Audit Query QSD
Subquery Tabulation

SQL

WITH audit_range AS (
SELECT DISTINCT ext_bank_account_id
  FROM iby_ext_bank_accounts_a aup
 WHERE 1=1
&lp_beg_dat
&lp_end_dat
), audit_union AS (
SELECT ext_bank_account_id acc_id,
       CASE WHEN Substr (audit_true_nulls, 2, 1) = 'Y' OR audit_transaction_type = 'I' THEN '*NULL*' ELSE bank_account_name END acc_name,
       CASE WHEN Substr (audit_true_nulls, 3, 1) = 'Y' OR audit_transaction_type = 'I' THEN '*NULL*' ELSE bank_account_num END acc_num,
       CASE WHEN Substr (audit_true_nulls, 8, 1) = 'Y' OR audit_transaction_type = 'I' THEN '*NULL*' ELSE iban END iban,
       CASE WHEN Substr (audit_true_nulls, 7, 1) = 'Y' OR audit_transaction_type = 'I' THEN '*NULL*' ELSE currency_code END curr,
       CASE WHEN Substr (audit_true_nulls, 6, 1) = 'Y' OR audit_transaction_type = 'I' THEN '*NULL*' ELSE country_code END coun,
       CASE WHEN Substr (audit_true_nulls, 4, 1) = 'Y' OR audit_transaction_type = 'I' THEN 0 ELSE bank_id END bank_id,
       CASE WHEN Substr (audit_true_nulls, 5, 1) = 'Y' OR audit_transaction_type = 'I' THEN 0 ELSE branch_id END branch_id,
       audit_sequence_id seq_id,
       audit_timestamp,
       audit_user_name a_user,
       CASE WHEN audit_transaction_type = 'I' THEN 'INSERT' ELSE 'UPDATE' END a_type
  FROM iby_ext_bank_accounts_a aup
 WHERE aup.ext_bank_account_id IN (SELECT ext_bank_account_id FROM audit_range)
&lp_beg_dat
 UNION
SELECT bac.ext_bank_account_id,
       bac.bank_account_name,
       bac.bank_account_num,
       bac.iban,
       bac.currency_code,
       bac.country_code,
       bac.bank_id,
       bac.branch_id,
       NULL,
       bac.last_update_date,
       usr.user_name,
       NULL
  FROM iby_ext_bank_accounts            bac
  JOIN fnd_user                         usr
    ON usr.user_id                      = bac.last_updated_by
 WHERE bac.ext_bank_account_id IN (SELECT ext_bank_account_id FROM audit_range)
), audit_pairs AS (
SELECT acc_id,
       acc_name,
       bank_id,
       First_Value (bank_id IGNORE NULLS) OVER 
        (PARTITION BY acc_id ORDER BY audit_timestamp, seq_id ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) bank_id_n,
       branch_id,
       First_Value (branch_id IGNORE NULLS) OVER 
        (PARTITION BY acc_id ORDER BY audit_timestamp, seq_id ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) branch_id_n,
       First_Value (acc_name IGNORE NULLS) OVER 
        (PARTITION BY acc_id ORDER BY audit_timestamp, seq_id ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) acc_name_n,
       acc_num,
       First_Value (acc_num IGNORE NULLS) OVER 
        (PARTITION BY acc_id ORDER BY audit_timestamp, seq_id ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) acc_num_n,
       iban,
       First_Value (iban IGNORE NULLS) OVER 
        (PARTITION BY acc_id ORDER BY audit_timestamp, seq_id ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) iban_n,
       curr,
       First_Value (curr IGNORE NULLS) OVER 
        (PARTITION BY acc_id ORDER BY audit_timestamp, seq_id ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) curr_n,
       coun,
       First_Value (iban IGNORE NULLS) OVER 
        (PARTITION BY acc_id ORDER BY audit_timestamp, seq_id ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) coun_n,
       seq_id,
       audit_timestamp,
       a_user,
       a_type
  FROM audit_union
)
SELECT aup.acc_id,
       par_bnk.party_name bank_name,
       par_brn.party_name bra_name,
       orp.bank_or_branch_number bra_num,
       bac.bank_account_name acc_name,
       bac.bank_account_num acc_num,
       bac.iban,
       bac.country_code coun,
       bac.currency_code curr,
       To_Char (bac.creation_date, 'DD-MON-YYYY HH24:MI:SS') creation_date,
       usr.user_name created_by,
       To_Char (aup.audit_timestamp, 'DD-MON-YYYY HH24:MI:SS') a_time,
       aup.a_user,
       aup.a_type,
/*
attr: 1. NULL -> no change; 2. 0/'*NULL*' -> change from null; 3. 'other'-> change from not null
        old: 1 and 2 both return NULL; 3 returns old not null value
        new: only return value for 2 and 3, meaning some change
*/
       CASE WHEN aup.bank_id != 0 THEN par_bnk_o.party_name END bank_name_o,
       CASE WHEN aup.bank_id IS NOT NULL THEN CASE WHEN aup.bank_id_n != 0 THEN par_bnk_o.party_name END END bank_name_n,
       CASE WHEN aup.branch_id != 0 THEN par_brn_o.party_name END bra_name_o,
       CASE WHEN aup.branch_id IS NOT NULL THEN CASE WHEN aup.branch_id_n != 0 THEN par_brn_n.party_name END END bra_name_n,
       CASE WHEN aup.branch_id != 0 THEN orp_o.bank_or_branch_number END bra_num_o,
       CASE WHEN aup.branch_id IS NOT NULL THEN CASE WHEN aup.branch_id_n != 0 THEN orp_n.bank_or_branch_number END END bra_num_n,
       CASE WHEN aup.acc_name != '*NULL*' THEN aup.acc_name END acc_name_o,
       CASE WHEN aup.acc_name IS NOT NULL THEN CASE WHEN aup.acc_name_n != '*NULL*' THEN aup.acc_name_n END END acc_name_n,
       CASE WHEN aup.acc_num != '*NULL*' THEN aup.acc_num END acc_num_o,
       CASE WHEN aup.acc_num IS NOT NULL THEN CASE WHEN aup.acc_num_n != '*NULL*' THEN aup.acc_num_n END END acc_num_n,
       CASE WHEN aup.iban != '*NULL*' THEN aup.iban END iban_o,
       CASE WHEN aup.iban IS NOT NULL THEN CASE WHEN aup.iban_n != '*NULL*' THEN aup.iban_n END END iban_n,
       CASE WHEN aup.curr != '*NULL*' THEN aup.curr END curr_o,
       CASE WHEN aup.curr IS NOT NULL THEN CASE WHEN aup.curr_n != '*NULL*' THEN aup.curr_n END END curr_n,
       CASE WHEN aup.coun != '*NULL*' THEN aup.coun END coun_o,
       CASE WHEN aup.coun IS NOT NULL THEN CASE WHEN aup.coun_n != '*NULL*' THEN aup.coun_n END END coun_n
  FROM audit_pairs                      aup
  JOIN iby_ext_bank_accounts            bac
    ON bac.ext_bank_account_id          = aup.acc_id
  LEFT JOIN hz_parties                  par_bnk
    ON par_bnk.party_id                 = bac.bank_id
  LEFT JOIN hz_parties                  par_bnk_o
    ON par_bnk_o.party_id               = aup.bank_id
  LEFT JOIN hz_parties                  par_bnk_n
    ON par_bnk_n.party_id               = aup.bank_id_n
  LEFT JOIN hz_parties                  par_brn
    ON par_brn.party_id                 = bac.branch_id
  LEFT JOIN hz_organization_profiles    orp
    ON orp.party_id                     = par_brn.party_id
   AND SYSDATE BETWEEN Trunc (orp.effective_start_date) AND Nvl (Trunc (orp.effective_end_date), SYSDATE+1)
  LEFT JOIN hz_parties                  par_brn_o
    ON par_brn_o.party_id               = aup.branch_id
  LEFT JOIN hz_organization_profiles    orp_o
    ON orp_o.party_id                   = par_brn_o.party_id
   AND SYSDATE BETWEEN Trunc (orp_o.effective_start_date) AND Nvl (Trunc (orp_o.effective_end_date), SYSDATE+1)
  LEFT JOIN hz_parties                  par_brn_n
    ON par_brn_n.party_id               = aup.branch_id_n
  LEFT JOIN hz_organization_profiles    orp_n
    ON orp_n.party_id                   = par_brn_n.party_id
   AND SYSDATE BETWEEN Trunc (orp_n.effective_start_date) AND Nvl (Trunc (orp_n.effective_end_date), SYSDATE+1)
  JOIN fnd_user                         usr
    ON usr.user_id                      = bac.created_by
 WHERE aup.seq_id                       IS NOT NULL
&lp_beg_dat
&lp_end_dat
 ORDER BY aup.acc_id, aup.audit_timestamp DESC, aup.seq_id DESC

Owners Query
Query Structure Diagram
Owners Query
Subquery Tabulation

SQL

SELECT par.party_name owner,
       sup.vendor_name,
       sup.segment1
  FROM iby_account_owners               own
  JOIN hz_parties                       par
    ON par.party_id                     = own.account_owner_party_id
  LEFT JOIN ap_suppliers                sup
    ON sup.party_id                     = par.party_id
 WHERE own.ext_bank_account_id          = :ACC_ID

Code for Bank Account Auditing XML Publisher Report

XX_IBYBNKAUDIT

See also A Generic Unix Script for Uploading Oracle eBusiness Concurrent Programs






NoCOUG SQL Challenge 2014 Illustrated

An SQL challenge was posted recently on the blog of the Northern California Oracle user group, SQL Mini-challenge. A query was given against Oracle's demo HR schema, with description:

It lists the locations containing a department that either contains an employee named Steven King or an employee who holds the title of President or an employee who has previously held the title of President.

The challenge was to rewrite the query avoiding the relatively expensive existence subqueries, and minimising the number of consistent gets reported on the small demo data set. Oracle's Cost-Based Optimiser will itself transform queries within its parsing phase, but at a relatively low level; for example, an 'OR' condition might be changed to a union if the CBO thinks that will aid performance. The solutions to the challenge present a nice illustration of how more extensive query transfomations can improve performance and modularity characteristics.

In this article I will list four equivalent queries for the problem, three based on solutions provided on the blog, using Ansi syntax here for consistency. For each query I give the output from DBMS_XPlan, and include a query structure diagram following my own diagramming notation. The example query provided in the challenge is not in fact the most literal translation of the requirement into SQL, and I think it will be interesting to start with my idea of what that would be.

Update 28 October 2014: I noticed that in the 'literal' query I had omitted the location condition on the third subquery. I have fixed this and the execution plan is much worse. The results reported here were from version 11.2; running on v12.1 gives some extremely interesting differences, and I have added the v12.1 results at the end for the 'literal' query. They show a much improved plan, with departments 'factorised' out.

Query 1: Literal
This my attempt at the most literal translation of the stated requirement into SQL. The three conditions are all separately expressed as existence subqueries.

QSD Literal

NCOUG-Literal

Query Literal
Note that in an earlier version of this article, I had omitted the final 'd.location_id = l.location_id' condition.

SELECT l.location_id, l.city
  FROM locations l
 WHERE EXISTS
  (SELECT *
     FROM departments d
     JOIN employees e
       ON e.department_id = d.department_id
    WHERE d.location_id = l.location_id
      AND e.first_name = 'Steven' AND e.last_name = 'King'
  ) OR EXISTS
  (SELECT *
     FROM departments d
     JOIN employees e
       ON e.department_id = d.department_id
     JOIN jobs j
       ON j.job_id = e.job_id
    WHERE d.location_id = l.location_id
      AND j.job_title = 'President'
  ) OR EXISTS
  (SELECT *
     FROM departments d
     JOIN employees e
       ON e.department_id = d.department_id
     JOIN job_history h
       ON h.employee_id = e.employee_id
     JOIN jobs j2
       ON j2.job_id = h.job_id
    WHERE d.location_id = l.location_id
      AND j2.job_title   = 'President'
  )

XPlan Literal

-------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name              | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |                   |      1 |        |      1 |00:00:00.01 |     426 |       |       |          |
|*  1 |  FILTER                          |                   |      1 |        |      1 |00:00:00.01 |     426 |       |       |          |
|   2 |   VIEW                           | index$_join$_001  |      1 |     23 |     23 |00:00:00.01 |       7 |       |       |          |
|*  3 |    HASH JOIN                     |                   |      1 |        |     23 |00:00:00.01 |       7 |  1023K|  1023K| 1150K (0)|
|   4 |     INDEX FAST FULL SCAN         | LOC_CITY_IX       |      1 |     23 |     23 |00:00:00.01 |       3 |       |       |          |
|   5 |     INDEX FAST FULL SCAN         | LOC_ID_PK         |      1 |     23 |     23 |00:00:00.01 |       4 |       |       |          |
|   6 |   NESTED LOOPS                   |                   |     23 |        |      1 |00:00:00.01 |      92 |       |       |          |
|   7 |    NESTED LOOPS                  |                   |     23 |      1 |     23 |00:00:00.01 |      69 |       |       |          |
|   8 |     TABLE ACCESS BY INDEX ROWID  | EMPLOYEES         |     23 |      1 |     23 |00:00:00.01 |      46 |       |       |          |
|*  9 |      INDEX RANGE SCAN            | EMP_NAME_IX       |     23 |      1 |     23 |00:00:00.01 |      23 |       |       |          |
|* 10 |     INDEX UNIQUE SCAN            | DEPT_ID_PK        |     23 |      1 |     23 |00:00:00.01 |      23 |       |       |          |
|* 11 |    TABLE ACCESS BY INDEX ROWID   | DEPARTMENTS       |     23 |      1 |      1 |00:00:00.01 |      23 |       |       |          |
|  12 |   NESTED LOOPS                   |                   |     22 |        |      0 |00:00:00.01 |     173 |       |       |          |
|  13 |    NESTED LOOPS                  |                   |     22 |      1 |     88 |00:00:00.01 |     166 |       |       |          |
|  14 |     NESTED LOOPS                 |                   |     22 |      2 |      6 |00:00:00.01 |     160 |       |       |          |
|* 15 |      TABLE ACCESS FULL           | JOBS              |     22 |      1 |     22 |00:00:00.01 |     132 |       |       |          |
|  16 |      TABLE ACCESS BY INDEX ROWID | DEPARTMENTS       |     22 |      2 |      6 |00:00:00.01 |      28 |       |       |          |
|* 17 |       INDEX RANGE SCAN           | DEPT_LOCATION_IX  |     22 |      2 |      6 |00:00:00.01 |      22 |       |       |          |
|* 18 |     INDEX RANGE SCAN             | EMP_DEPARTMENT_IX |      6 |     10 |     88 |00:00:00.01 |       6 |       |       |          |
|* 19 |    TABLE ACCESS BY INDEX ROWID   | EMPLOYEES         |     88 |      1 |      0 |00:00:00.01 |       7 |       |       |          |
|  20 |   NESTED LOOPS                   |                   |     22 |        |      0 |00:00:00.01 |     154 |       |       |          |
|  21 |    NESTED LOOPS                  |                   |     22 |      1 |      0 |00:00:00.01 |     154 |       |       |          |
|  22 |     NESTED LOOPS                 |                   |     22 |      1 |      0 |00:00:00.01 |     154 |       |       |          |
|  23 |      NESTED LOOPS                |                   |     22 |      1 |      0 |00:00:00.01 |     154 |       |       |          |
|* 24 |       TABLE ACCESS FULL          | JOBS              |     22 |      1 |     22 |00:00:00.01 |     132 |       |       |          |
|  25 |       TABLE ACCESS BY INDEX ROWID| JOB_HISTORY       |     22 |      1 |      0 |00:00:00.01 |      22 |       |       |          |
|* 26 |        INDEX RANGE SCAN          | JHIST_JOB_IX      |     22 |      1 |      0 |00:00:00.01 |      22 |       |       |          |
|  27 |      TABLE ACCESS BY INDEX ROWID | EMPLOYEES         |      0 |      1 |      0 |00:00:00.01 |       0 |       |       |          |
|* 28 |       INDEX UNIQUE SCAN          | EMP_EMP_ID_PK     |      0 |      1 |      0 |00:00:00.01 |       0 |       |       |          |
|* 29 |     INDEX UNIQUE SCAN            | DEPT_ID_PK        |      0 |      1 |      0 |00:00:00.01 |       0 |       |       |          |
|* 30 |    TABLE ACCESS BY INDEX ROWID   | DEPARTMENTS       |      0 |      1 |      0 |00:00:00.01 |       0 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(( IS NOT NULL OR  IS NOT NULL OR  IS NOT NULL))
   3 - access(ROWID=ROWID)
   9 - access("E"."LAST_NAME"='King' AND "E"."FIRST_NAME"='Steven')
  10 - access("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID")
  11 - filter("D"."LOCATION_ID"=:B1)
  15 - filter("J"."JOB_TITLE"='President')
  17 - access("D"."LOCATION_ID"=:B1)
  18 - access("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID")
  19 - filter("J"."JOB_ID"="E"."JOB_ID")
  24 - filter("J2"."JOB_TITLE"='President')
  26 - access("J2"."JOB_ID"="H"."JOB_ID")
  28 - access("H"."EMPLOYEE_ID"="E"."EMPLOYEE_ID")
  29 - access("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID")
  30 - filter("D"."LOCATION_ID"=:B1)

Query 2: NoCOUG Example
This is the example in the original challenge article, translated into Ansi syntax. It nests the job history existence subquery within an outer existence subquery, and references the departments and employees tables only once.

QSD NoCOUG Example

NCOUG-Example

Query NoCOUG Example

SELECT l.location_id, l.city
  FROM locations l
 WHERE EXISTS
  (SELECT *
     FROM departments d
     JOIN employees e
       ON e.department_id = d.department_id
     JOIN jobs j
       ON j.job_id = e.job_id
    WHERE d.location_id = l.location_id
      AND ( 
            (e.first_name = 'Steven' AND e.last_name = 'King')
         OR j.job_title = 'President'
         OR EXISTS
            (SELECT *
               FROM job_history h
               JOIN jobs j2
                 ON j2.job_id = h.job_id
              WHERE h.employee_id = e.employee_id
                AND j2.job_title   = 'President'
            ) 
          )
  )

XPlan NoCOUG Example

-------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name              | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |                   |      1 |        |      1 |00:00:00.01 |     152 |       |       |          |
|*  1 |  HASH JOIN SEMI                  |                   |      1 |      7 |      1 |00:00:00.01 |     152 |  1156K|  1156K| 1120K (0)|
|   2 |   VIEW                           | index$_join$_001  |      1 |     23 |     23 |00:00:00.01 |       6 |       |       |          |
|*  3 |    HASH JOIN                     |                   |      1 |        |     23 |00:00:00.01 |       6 |  1023K|  1023K| 1443K (0)|
|   4 |     INDEX FAST FULL SCAN         | LOC_CITY_IX       |      1 |     23 |     23 |00:00:00.01 |       3 |       |       |          |
|   5 |     INDEX FAST FULL SCAN         | LOC_ID_PK         |      1 |     23 |     23 |00:00:00.01 |       3 |       |       |          |
|   6 |   VIEW                           | VW_SQ_1           |      1 |     11 |      1 |00:00:00.01 |     146 |       |       |          |
|*  7 |    FILTER                        |                   |      1 |        |      1 |00:00:00.01 |     146 |       |       |          |
|*  8 |     HASH JOIN                    |                   |      1 |    106 |    106 |00:00:00.01 |      15 |   876K|   876K|  895K (0)|
|   9 |      MERGE JOIN                  |                   |      1 |    107 |    107 |00:00:00.01 |       8 |       |       |          |
|  10 |       TABLE ACCESS BY INDEX ROWID| JOBS              |      1 |     19 |     19 |00:00:00.01 |       2 |       |       |          |
|  11 |        INDEX FULL SCAN           | JOB_ID_PK         |      1 |     19 |     19 |00:00:00.01 |       1 |       |       |          |
|* 12 |       SORT JOIN                  |                   |     19 |    107 |    107 |00:00:00.01 |       6 | 15360 | 15360 |14336  (0)|
|  13 |        TABLE ACCESS FULL         | EMPLOYEES         |      1 |    107 |    107 |00:00:00.01 |       6 |       |       |          |
|  14 |      TABLE ACCESS FULL           | DEPARTMENTS       |      1 |     27 |     27 |00:00:00.01 |       7 |       |       |          |
|  15 |     NESTED LOOPS                 |                   |    105 |        |      0 |00:00:00.01 |     131 |       |       |          |
|  16 |      NESTED LOOPS                |                   |    105 |      1 |     10 |00:00:00.01 |     121 |       |       |          |
|  17 |       TABLE ACCESS BY INDEX ROWID| JOB_HISTORY       |    105 |      1 |     10 |00:00:00.01 |     112 |       |       |          |
|* 18 |        INDEX RANGE SCAN          | JHIST_EMPLOYEE_IX |    105 |      1 |     10 |00:00:00.01 |     105 |       |       |          |
|* 19 |       INDEX UNIQUE SCAN          | JOB_ID_PK         |     10 |      1 |     10 |00:00:00.01 |       9 |       |       |          |
|* 20 |      TABLE ACCESS BY INDEX ROWID | JOBS              |     10 |      1 |      0 |00:00:00.01 |      10 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ITEM_1"="L"."LOCATION_ID")
   3 - access(ROWID=ROWID)
   7 - filter((("E"."FIRST_NAME"='Steven' AND "E"."LAST_NAME"='King') OR "J"."JOB_TITLE"='President' OR  IS NOT NULL))
   8 - access("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID")
  12 - access("J"."JOB_ID"="E"."JOB_ID")
       filter("J"."JOB_ID"="E"."JOB_ID")
  18 - access("H"."EMPLOYEE_ID"=:B1)
  19 - access("J2"."JOB_ID"="H"."JOB_ID")
  20 - filter("J2"."JOB_TITLE"='President')

Query 3: Subquery Factor Union
This converts the 'OR' conditions into a union of three driving subqueries that return the matching department ids from a subquery factor (which could equally be an inline view), and then joins departments and locations.

QSD Subquery Factor Union

NCOUG-SQF

Query Subquery Factor Union

WITH driving_union AS (
SELECT e.department_id
  FROM employees e
 WHERE (e.first_name = 'Steven' AND e.last_name = 'King')
 UNION
SELECT e.department_id
  FROM jobs j
  JOIN employees e
    ON e.job_id        = j.job_id
 WHERE j.job_title     = 'President'
 UNION
SELECT e.department_id
  FROM jobs j
  JOIN job_history h
    ON j.job_id        = h.job_id
  JOIN employees e
    ON e.employee_id    = h.employee_id
 WHERE j.job_title     = 'President'
)
SELECT DISTINCT l.location_id, l.city
  FROM driving_union u
  JOIN departments d
    ON d.department_id 	= u.department_id
  JOIN locations l
    ON l.location_id 	= d.location_id

XPlan Subquery Factor Union

----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name             | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                  |      1 |        |      1 |00:00:00.01 |      29 |       |       |          |
|   1 |  HASH UNIQUE                         |                  |      1 |      8 |      1 |00:00:00.01 |      29 |  1156K|  1156K|  464K (0)|
|*  2 |   HASH JOIN                          |                  |      1 |      8 |      1 |00:00:00.01 |      29 |  1517K|  1517K|  366K (0)|
|*  3 |    HASH JOIN                         |                  |      1 |      8 |      1 |00:00:00.01 |      23 |  1517K|  1517K|  382K (0)|
|   4 |     VIEW                             |                  |      1 |      8 |      1 |00:00:00.01 |      17 |       |       |          |
|   5 |      SORT UNIQUE                     |                  |      1 |      8 |      1 |00:00:00.01 |      17 |  2048 |  2048 | 2048  (0)|
|   6 |       UNION-ALL                      |                  |      1 |        |      2 |00:00:00.01 |      17 |       |       |          |
|   7 |        TABLE ACCESS BY INDEX ROWID   | EMPLOYEES        |      1 |      1 |      1 |00:00:00.01 |       2 |       |       |          |
|*  8 |         INDEX RANGE SCAN             | EMP_NAME_IX      |      1 |      1 |      1 |00:00:00.01 |       1 |       |       |          |
|   9 |        NESTED LOOPS                  |                  |      1 |        |      1 |00:00:00.01 |       8 |       |       |          |
|  10 |         NESTED LOOPS                 |                  |      1 |      6 |      1 |00:00:00.01 |       7 |       |       |          |
|* 11 |          TABLE ACCESS FULL           | JOBS             |      1 |      1 |      1 |00:00:00.01 |       6 |       |       |          |
|* 12 |          INDEX RANGE SCAN            | EMP_JOB_IX       |      1 |      6 |      1 |00:00:00.01 |       1 |       |       |          |
|  13 |         TABLE ACCESS BY INDEX ROWID  | EMPLOYEES        |      1 |      6 |      1 |00:00:00.01 |       1 |       |       |          |
|  14 |        NESTED LOOPS                  |                  |      1 |        |      0 |00:00:00.01 |       7 |       |       |          |
|  15 |         NESTED LOOPS                 |                  |      1 |      1 |      0 |00:00:00.01 |       7 |       |       |          |
|  16 |          NESTED LOOPS                |                  |      1 |      1 |      0 |00:00:00.01 |       7 |       |       |          |
|* 17 |           TABLE ACCESS FULL          | JOBS             |      1 |      1 |      1 |00:00:00.01 |       6 |       |       |          |
|  18 |           TABLE ACCESS BY INDEX ROWID| JOB_HISTORY      |      1 |      1 |      0 |00:00:00.01 |       1 |       |       |          |
|* 19 |            INDEX RANGE SCAN          | JHIST_JOB_IX     |      1 |      1 |      0 |00:00:00.01 |       1 |       |       |          |
|* 20 |          INDEX UNIQUE SCAN           | EMP_EMP_ID_PK    |      0 |      1 |      0 |00:00:00.01 |       0 |       |       |          |
|  21 |         TABLE ACCESS BY INDEX ROWID  | EMPLOYEES        |      0 |      1 |      0 |00:00:00.01 |       0 |       |       |          |
|  22 |     VIEW                             | index$_join$_011 |      1 |     27 |     27 |00:00:00.01 |       6 |       |       |          |
|* 23 |      HASH JOIN                       |                  |      1 |        |     27 |00:00:00.01 |       6 |  1096K|  1096K| 1547K (0)|
|  24 |       INDEX FAST FULL SCAN           | DEPT_ID_PK       |      1 |     27 |     27 |00:00:00.01 |       3 |       |       |          |
|  25 |       INDEX FAST FULL SCAN           | DEPT_LOCATION_IX |      1 |     27 |     27 |00:00:00.01 |       3 |       |       |          |
|  26 |    VIEW                              | index$_join$_013 |      1 |     23 |     23 |00:00:00.01 |       6 |       |       |          |
|* 27 |     HASH JOIN                        |                  |      1 |        |     23 |00:00:00.01 |       6 |  1023K|  1023K| 1426K (0)|
|  28 |      INDEX FAST FULL SCAN            | LOC_CITY_IX      |      1 |     23 |     23 |00:00:00.01 |       3 |       |       |          |
|  29 |      INDEX FAST FULL SCAN            | LOC_ID_PK        |      1 |     23 |     23 |00:00:00.01 |       3 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("L"."LOCATION_ID"="D"."LOCATION_ID")
   3 - access("D"."DEPARTMENT_ID"="U"."DEPARTMENT_ID")
   8 - access("E"."LAST_NAME"='King' AND "E"."FIRST_NAME"='Steven')
  11 - filter("J"."JOB_TITLE"='President')
  12 - access("E"."JOB_ID"="J"."JOB_ID")
  17 - filter("J"."JOB_TITLE"='President')
  19 - access("J"."JOB_ID"="H"."JOB_ID")
  20 - access("E"."EMPLOYEE_ID"="H"."EMPLOYEE_ID")
  23 - access(ROWID=ROWID)
  27 - access(ROWID=ROWID)

Query 4: Outer Joins
This avoids existence subqueries using the idea that an outer join with a constraint that the joined record is not null, with a distinct qualifier to eliminate duplicates, can serve as a logical equivalent.

QSD Outer Joins

NCOUG-OJ

Query Outer Joins

SELECT DISTINCT l.location_id, l.city
  FROM employees e
  LEFT JOIN jobs j
    ON j.job_id        	= e.job_id
   AND j.job_title      = 'President'  
  LEFT JOIN job_history h
    ON h.employee_id    = e.employee_id
  LEFT JOIN jobs j2
    ON j2.job_id        = h.job_id
   AND j2.job_title     = 'President' 
  JOIN departments d
    ON d.department_id 	= e.department_id
  JOIN locations l
    ON l.location_id 	= d.location_id
 WHERE (e.first_name = 'Steven' AND e.last_name = 'King')
    OR j.job_id IS NOT NULL
    OR j2.job_id IS NOT NULL

XPlan Outer Joins

----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name              | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                   |      1 |        |      1 |00:00:00.01 |      36 |       |       |          |
|   1 |  HASH UNIQUE                  |                   |      1 |    106 |      1 |00:00:00.01 |      36 |  1156K|  1156K|  464K (0)|
|*  2 |   FILTER                      |                   |      1 |        |      1 |00:00:00.01 |      36 |       |       |          |
|*  3 |    HASH JOIN RIGHT OUTER      |                   |      1 |    106 |    109 |00:00:00.01 |      36 |  1269K|  1269K|  369K (0)|
|*  4 |     TABLE ACCESS FULL         | JOBS              |      1 |      1 |      1 |00:00:00.01 |       6 |       |       |          |
|*  5 |     HASH JOIN RIGHT OUTER     |                   |      1 |    106 |    109 |00:00:00.01 |      30 |  1134K|  1134K|  751K (0)|
|   6 |      VIEW                     | index$_join$_004  |      1 |     10 |     10 |00:00:00.01 |       6 |       |       |          |
|*  7 |       HASH JOIN               |                   |      1 |        |     10 |00:00:00.01 |       6 |  1096K|  1096K| 1331K (0)|
|   8 |        INDEX FAST FULL SCAN   | JHIST_EMPLOYEE_IX |      1 |     10 |     10 |00:00:00.01 |       3 |       |       |          |
|   9 |        INDEX FAST FULL SCAN   | JHIST_JOB_IX      |      1 |     10 |     10 |00:00:00.01 |       3 |       |       |          |
|* 10 |      HASH JOIN OUTER          |                   |      1 |    106 |    106 |00:00:00.01 |      24 |   858K|   858K| 1270K (0)|
|* 11 |       HASH JOIN               |                   |      1 |    106 |    106 |00:00:00.01 |      18 |  1063K|  1063K| 1252K (0)|
|* 12 |        HASH JOIN              |                   |      1 |     27 |     27 |00:00:00.01 |      12 |  1156K|  1156K| 1133K (0)|
|  13 |         VIEW                  | index$_join$_010  |      1 |     23 |     23 |00:00:00.01 |       6 |       |       |          |
|* 14 |          HASH JOIN            |                   |      1 |        |     23 |00:00:00.01 |       6 |  1023K|  1023K| 1450K (0)|
|  15 |           INDEX FAST FULL SCAN| LOC_CITY_IX       |      1 |     23 |     23 |00:00:00.01 |       3 |       |       |          |
|  16 |           INDEX FAST FULL SCAN| LOC_ID_PK         |      1 |     23 |     23 |00:00:00.01 |       3 |       |       |          |
|  17 |         VIEW                  | index$_join$_008  |      1 |     27 |     27 |00:00:00.01 |       6 |       |       |          |
|* 18 |          HASH JOIN            |                   |      1 |        |     27 |00:00:00.01 |       6 |  1096K|  1096K| 1547K (0)|
|  19 |           INDEX FAST FULL SCAN| DEPT_ID_PK        |      1 |     27 |     27 |00:00:00.01 |       3 |       |       |          |
|  20 |           INDEX FAST FULL SCAN| DEPT_LOCATION_IX  |      1 |     27 |     27 |00:00:00.01 |       3 |       |       |          |
|  21 |        TABLE ACCESS FULL      | EMPLOYEES         |      1 |    107 |    107 |00:00:00.01 |       6 |       |       |          |
|* 22 |       TABLE ACCESS FULL       | JOBS              |      1 |      1 |      1 |00:00:00.01 |       6 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter((("E"."FIRST_NAME"='Steven' AND "E"."LAST_NAME"='King') OR "J"."JOB_ID" IS NOT NULL OR "J2"."JOB_ID" IS NOT NULL))
   3 - access("J2"."JOB_ID"="H"."JOB_ID")
   4 - filter("J2"."JOB_TITLE"='President')
   5 - access("H"."EMPLOYEE_ID"="E"."EMPLOYEE_ID")
   7 - access(ROWID=ROWID)
  10 - access("J"."JOB_ID"="E"."JOB_ID")
  11 - access("D"."DEPARTMENT_ID"="E"."DEPARTMENT_ID")
  12 - access("L"."LOCATION_ID"="D"."LOCATION_ID")
  14 - access(ROWID=ROWID)
  18 - access(ROWID=ROWID)
  22 - filter("J"."JOB_TITLE"='President')

Query Summary Table v11.2

Here is a summary of some statistics on the queries, run on an Oracle 11.2 XE instance. Query lines depends on formatting of course.

Query Buffers Table Instances XPlan Steps Query Lines
Literal 426 10 30 30
NoCOUG Example 152 6 20 23
Subquery Factor Union 29 6 29 25
Outer Joins 36 6 22 17

Oracle 12c

While the original version of this article, posted 25 August 2014, was based on Oracle 11.2, I later ran my script on Oracle 12.1, and noted that the 'literal' query now had a much-changed execution plan. In particular, the departments table had been 'factorised' out, appearing only once, and giving a much reduced buffer count of 31. It seems that this comes from an improvement in the transformation phase of query execution. The other queries did not show so much difference in plans, see the summary table below.

Execution Plan for Literal v12.1

-----------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name               | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |                    |      1 |        |      1 |00:00:00.05 |      31 |      1 |       |       |          |
|*  1 |  HASH JOIN SEMI                              |                    |      1 |      7 |      1 |00:00:00.05 |      31 |      1 |  1645K|  1645K| 1432K (0)|
|   2 |   VIEW                                       | index$_join$_001   |      1 |     23 |     23 |00:00:00.01 |       8 |      0 |       |       |          |
|*  3 |    HASH JOIN                                 |                    |      1 |        |     23 |00:00:00.01 |       8 |      0 |  1368K|  1368K| 1566K (0)|
|   4 |     INDEX FAST FULL SCAN                     | LOC_CITY_IX        |      1 |     23 |     23 |00:00:00.01 |       4 |      0 |       |       |          |
|   5 |     INDEX FAST FULL SCAN                     | LOC_ID_PK          |      1 |     23 |     23 |00:00:00.01 |       4 |      0 |       |       |          |
|   6 |   VIEW                                       | VW_SQ_1            |      1 |      8 |      1 |00:00:00.05 |      23 |      1 |       |       |          |
|   7 |    MERGE JOIN SEMI                           |                    |      1 |      8 |      1 |00:00:00.05 |      23 |      1 |       |       |          |
|   8 |     TABLE ACCESS BY INDEX ROWID              | DEPARTMENTS        |      1 |     27 |     10 |00:00:00.01 |       4 |      0 |       |       |          |
|   9 |      INDEX FULL SCAN                         | DEPT_ID_PK         |      1 |     27 |     10 |00:00:00.01 |       2 |      0 |       |       |          |
|* 10 |     SORT UNIQUE                              |                    |     10 |      8 |      1 |00:00:00.05 |      19 |      1 |  2048 |  2048 | 2048  (0)|
|  11 |      VIEW                                    | VW_JF_SET$1236063A |      1 |      8 |      2 |00:00:00.05 |      19 |      1 |       |       |          |
|  12 |       UNION-ALL                              |                    |      1 |        |      2 |00:00:00.05 |      19 |      1 |       |       |          |
|  13 |        NESTED LOOPS                          |                    |      1 |        |      1 |00:00:00.05 |       9 |      1 |       |       |          |
|  14 |         NESTED LOOPS                         |                    |      1 |      6 |      1 |00:00:00.05 |       8 |      1 |       |       |          |
|* 15 |          TABLE ACCESS FULL                   | JOBS               |      1 |      1 |      1 |00:00:00.01 |       7 |      0 |       |       |          |
|* 16 |          INDEX RANGE SCAN                    | EMP_JOB_IX         |      1 |      6 |      1 |00:00:00.05 |       1 |      1 |       |       |          |
|  17 |         TABLE ACCESS BY INDEX ROWID          | EMPLOYEES          |      1 |      6 |      1 |00:00:00.01 |       1 |      0 |       |       |          |
|  18 |        TABLE ACCESS BY INDEX ROWID BATCHED   | EMPLOYEES          |      1 |      1 |      1 |00:00:00.01 |       2 |      0 |       |       |          |
|* 19 |         INDEX RANGE SCAN                     | EMP_NAME_IX        |      1 |      1 |      1 |00:00:00.01 |       1 |      0 |       |       |          |
|  20 |        NESTED LOOPS                          |                    |      1 |        |      0 |00:00:00.01 |       8 |      0 |       |       |          |
|  21 |         NESTED LOOPS                         |                    |      1 |      1 |      0 |00:00:00.01 |       8 |      0 |       |       |          |
|  22 |          NESTED LOOPS                        |                    |      1 |      1 |      0 |00:00:00.01 |       8 |      0 |       |       |          |
|* 23 |           TABLE ACCESS FULL                  | JOBS               |      1 |      1 |      1 |00:00:00.01 |       7 |      0 |       |       |          |
|  24 |           TABLE ACCESS BY INDEX ROWID BATCHED| JOB_HISTORY        |      1 |      1 |      0 |00:00:00.01 |       1 |      0 |       |       |          |
|* 25 |            INDEX RANGE SCAN                  | JHIST_JOB_IX       |      1 |      1 |      0 |00:00:00.01 |       1 |      0 |       |       |          |
|* 26 |          INDEX UNIQUE SCAN                   | EMP_EMP_ID_PK      |      0 |      1 |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|  27 |         TABLE ACCESS BY INDEX ROWID          | EMPLOYEES          |      0 |      1 |      0 |00:00:00.01 |       0 |      0 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("VW_COL_1"="L"."LOCATION_ID")
   3 - access(ROWID=ROWID)
  10 - access("ITEM_1"="D"."DEPARTMENT_ID")
       filter("ITEM_1"="D"."DEPARTMENT_ID")
  15 - filter("J"."JOB_TITLE"='President')
  16 - access("J"."JOB_ID"="E"."JOB_ID")
  19 - access("E"."LAST_NAME"='King' AND "E"."FIRST_NAME"='Steven')
  23 - filter("J2"."JOB_TITLE"='President')
  25 - access("J2"."JOB_ID"="H"."JOB_ID")
  26 - access("H"."EMPLOYEE_ID"="E"."EMPLOYEE_ID")

Note
-----
   - this is an adaptive plan

Query Summary Table v12.1

Query Buffers Table Instances XPlan Steps Query Lines
Literal 31 10 27 31
NoCOUG Example 156 6 19 23
Subquery Factor Union 29 6 28 25
Outer Joins 45 6 22 17

Here is the v12.1 output:

NCOUG-3-121






Low-Hanging Fruit: The Classic Performance Tuning Anti-pattern

The July/August 2014 edition of Oracle Magazine contains an interesting PL/SQL performance tuning article by Steve Feuerstein, The Joy of Low-Hanging Fruit.

Low-Hanging Fruit

Low-Hanging Cherries

The article is a case study in which SF finds a "cursor FOR loop that contained two nonquery DML statements" and which he calls a classic anti-pattern. As he says, "the inserts and updates are changing the tables on a row-by-row basis, which maximizes the number of context switches". He also identifies an inefficiency in the code that is more problem-specific: An update process occurs within the loop that only needs to be performed at the end of the loop. (Although this is a generic kind of performance 'bug' I would resist the temptation to call it an anti-pattern, preferring to reserve the term for more deliberate strategies). The article goes on to replace the row-by-row DML using bulk processing, and to move the in-loop update outside the loop. SF claims a two to three-fold performance improvement in testing.

The performance improvement sounds good, but at the same time it occurs to me that the article itself could be seen as an illustration of the classic performance-tuning anti-pattern. This is when the tuner looks for things to optimise before he or she knows what is actually taking up the bulk of the time (and the article's title summarises it nicely - so I will use it to name the anti-pattern :) ). The main thrust of the article, as indicated in the subtitle, is the performance benefits of BULK COLLECT, but there is actually no indication that this is where the performance gains were made - it's possible, likely even, that all significant gains came from the other, more problem-specific, improvement.

The first paragraph of the final section of the article reads:

"It might be lots of fun to completely reorganize one's program in hopes of improving performance, but we can't just assume that the resulting code actually does run quickly."

This is exactly right, and is why the first step in any performance tuning process should be to identify the hot-spots. Tuning should then focus on those, ignoring areas where theoretical improvements will make no practical difference. If one might be permitted a second sylvan metaphor, it's important to perceive wood as well as trees.

William Blake illustration of Dante's Divine Comedy

The Wood of the Self-Murderers, by William Blake

There are many ways to identify the hot-spots, and I would recommend Oracle's PL/SQL profiling features, which I wrote about here, Notes on Profiling Oracle PL/SQL.

Notes on the Metaphors

Low-Hanging Fruit

For such a ubiquitous expression, this seems to be of surprisingly recent origin, according to: What Is the World’s Actual Lowest Hanging Fruit?.

The metaphorical usage doesn’t appear to have a very long history, though. According to Liberman, the Oxford English Dictionary’s first partial reference is from a 1968 Guardian article: “His rare images are picked aptly, easily, like low-hanging fruit.”

Not Seeing the Wood for the Trees

This expression is apparently much older, being traced back to (at least) 1546 here, Can't see the Economist for the Trees, where it is attributed to "The Proverbs of John Heywood" and the lines:

Plentie is no deyntie. ye see not your owne ease.
I see, ye can not see the wood for trees.

The article is ostensibly about an incorrect usage of the expression by The Economist, but is really more interesting than that, and includes equivalents in other languages. For example, Americans say:

Can't see the forest for the trees

while the French say:

L'arbre qui cache la forêt

.
The Wood of the Self-Murderers

This intriguing painting hangs at the Tate Gallery in London, and you can read more about it here, The Wood of the Self-Murderers: The Harpies and the Suicides

The work was completed between 1824 and 1827 and illustrates a passage from the Inferno canticle of the Divine Comedy by Dante Alighieri (1265–1321)

The passage concerned is:

Here the repellent harpies make their nests,
Who drove the Trojans from the Strophades
With dire announcements of the coming woe.
They have broad wings, a human neck and face,
Clawed feet and swollen, feathered bellies; they caw
Their lamentations in the eerie trees






Brendan's 2-Page Oracle Programming Standards

I recently had some discussions on PL/SQL coding standards centred around some documents I’d come across on the internet, as well as some internal company standards documents.

Steve Feuerstein has a useful page PL/SQL Standards linking to his own document and to two others:

Steven's Naming Conventions and Coding Standards, by Steve Feuerstein (12 page pdf).
PLSQL Standards Developed for the PLSQL Starter Framework-1.pdf, by Bill Coulam (34 page pdf).
Trivadis PL/SQL & SQL Coding Guidelines, by Roger Troller (50 page pdf).

Another interesting document is Naming and Coding Standards for SQL and PL/SQL by William Robertson (html), which emphasizes common sense in application of standards.

These are all valuable documents, but it was thought that many developers would not read and remain familiar with long documents of this kind, and a 2-page summary standards might be useful. Here is my attempt, as a Word document on Scribd:

Brendan's 2-Page Oracle Programming Standards by Brendan Furey







SQL for Continuum and Contiguity Grouping

A question was asked some time ago on Oracle's SQL&PL/SQL forum (Solution design and performance - SQL or PL/SQL?), the gist of which the poster specified thus:

I have a set of records which keep track of a status at a given date for a range of points on a given road. Basically, I need the current bits to "shine through".

The thread, despite the open-minded title, gives a nice illustration of the odd preference that people often have for complicated PL/SQL solutions to problems amenable to simpler SQL. I provided an SQL solution in that thread for this problem, and in this article I have taken a simpler, more general form of the problem defined there, and provide SQL solutions with diagrams illustrating how they work.

The class of problem can be characterised as follows:

  • Records have range fields, and we want to aggregate separately at each point within the 1-dimensional range domains (or 'continuum'), independently by grouping key
  • Within the groups we want to aggregate only the first or last records, ordering by some non-key fields

The first point above can be seen as equivalent to adding in to the main grouping key an implicit field specifying the domain leg, i.e. the region between the break points defined by the record ranges, and splitting the records by leg. We might term this form of aggregation continuum aggregation, or perhaps vertical aggregation if we view the range as distance, the internal ordering as by time, and visualise it graphically as in my diagrams below. Once this vertical aggregation is established, it is natural to think of adding a second aggregation, that might be termed contiguity aggregation, or with the same visualisation, horizontal aggregation:

  • Contiguous legs having the same values for the aggregated attributes will be grouped together

Here is an extract on grouping from Oracle® Database SQL Language Reference:

Oracle applies the aggregate functions to each group of rows and returns a single result row for each group.
...
The aggregate functions MIN, MAX, SUM, AVG, COUNT, VARIANCE, and STDDEV, when followed by the KEEP keyword, can be used in conjunction with the FIRST or LAST function to operate on a set of values from a set of rows that rank as the FIRST or LAST with respect to a given sorting specification. Refer to FIRST for more information.

A simple use-case for the KEEP form of grouping is to list contacts with their most recent telephone number when multiple numbers are stored in a separate table. A few years ago, I realised that in most examples that I see developers write their own code instead of using the built-in constructs, and I wrote an article comparing the performance and complexity of the various alternatives for this requirement (as well as alternatives for SQL pivoting), SQL Pivot and Prune Queries - Keeping an Eye on Performance.

Update, 20 November 2013: Added a Postgres version of the SQL solution.

Test Problem

Data Model

We take for our example problem a very simple model consisting of roads and road events where the events occur over a stretch of road at a given time, and event type forms the horizontal grouping attribute.

Road - ERD2

Test Data

Input data - Roads

ROAD_ID ROAD_DESC       START_POINT  END_POINT
------- --------------- ----------- ----------
      1 The Strand                0        200
      2 Piccadilly                0        100

Input data - road_events

ROAD_ID     REV_ID E_TYPE START_POINT  END_POINT E_DATE
------- ---------- ------ ----------- ---------- ---------
      1          1 OPEN             0         10 01-JAN-07
                 2 OPEN            20         50 01-JAN-08
                 3 CLOSED         130        160 01-FEB-08
                 4 CLOSED          55         85 05-JUN-08
                 5 OTHER           45        115 01-JAN-09
                 6 OPEN            60        100 12-FEB-11
                 7 CLOSED         115        145 12-FEB-12
      2          8 CLOSED          10         30 01-JAN-10
                 9 OPEN            40         50 01-JAN-11
                10 OPEN            50         70 01-JAN-12

10 rows selected.

The following diagrams display the data and solutions for our test problem.

Road - Road 1

Road - Road 2

SQL Solution for 'Point' Problem

To start with, here is a standard SQL solution for the 'zero-dimensional' problem, where the most recent record is required for each road:

Zero-Dimensional Solution

ROAD_DESC          R_START      R_END    E_START      E_END E_DATE    E_TYPE
--------------- ---------- ---------- ---------- ---------- --------- ------
Piccadilly               0        100         50         70 01-JAN-12 OPEN
The Strand               0        200        115        145 12-FEB-12 CLOSED

  1  SELECT r.road_desc, r.start_point r_start, r.end_point r_end,
  2         Max (e.start_point) KEEP (DENSE_RANK LAST ORDER BY e.event_date) e_start,
  3         Max (e.end_point) KEEP (DENSE_RANK LAST ORDER BY e.event_date) e_end,
  4         Max (e.event_date) KEEP (DENSE_RANK LAST ORDER BY e.event_date) e_date,
  5         Max (e.event_type) KEEP (DENSE_RANK LAST ORDER BY e.event_date) e_type
  6    FROM road_events e
  7    JOIN roads r
  8      ON r.id = e.road_id
  9   GROUP BY r.road_desc, r.start_point, r.end_point
 10*  ORDER BY 1, 2, 3

The SQL is more complicated for the continuum problem, and we provide two versions, that differ in the final stage, of horizontal grouping by contiguous event type. The first uses a differencing method popular in Oracle 11g and earlier versions for this common type of grouping problem; the second uses a new feature from Oracle 12c, row pattern matching. [I have also used another technique for this type of grouping, in an article on contiguity and other range-based grouping problems in June 2011, Forming Range-Based Break Groups With Advanced SQL.]

SQL Solution for 'Continuum' Problem, Contiguity Grouping by Differences

SQL (Contiguity by Differences)

WITH breaks AS  (
        SELECT road_id, start_point bp FROM road_events
         UNION
        SELECT road_id, end_point FROM road_events
         UNION
        SELECT id, start_point FROM roads
         UNION
        SELECT id, end_point FROM roads
), legs AS (
        SELECT road_id, bp leg_start, Lead (bp) OVER (PARTITION BY road_id ORDER BY bp) leg_end
          FROM breaks
), latest_events AS ( 
        SELECT l.road_id, l.leg_start, l.leg_end,
               Max (e.id) KEEP (DENSE_RANK LAST ORDER BY e.event_date) event_id,
               Nvl (Max (e.event_type) KEEP (DENSE_RANK LAST ORDER BY e.event_date), '(none)') event_type
          FROM legs l
          LEFT JOIN road_events e
            ON e.road_id = l.road_id
           AND e.start_point <= l.leg_start
	   AND e.end_point >= l.leg_end
         WHERE l.leg_end IS NOT NULL
         GROUP BY l.road_id, l.leg_start, l.leg_end
), latest_events_group AS ( 
        SELECT road_id,
               leg_start,
               leg_end,
               event_id,
               event_type,
               Dense_Rank () OVER (PARTITION BY road_id ORDER BY leg_start, leg_end) -
               Dense_Rank () OVER (PARTITION BY road_id, event_type ORDER BY leg_start, leg_end) group_no
          FROM latest_events
)
SELECT l.road_id, r.road_desc,
       Min (l.leg_start)        sec_start,
       Max (l.leg_end)          sec_end,
       l.event_type             e_type,
       l.group_no
  FROM latest_events_group l
  JOIN roads r
    ON r.id = l.road_id
 GROUP BY l.road_id,
        r.road_desc, 
        l.event_type,
        l.group_no
ORDER BY 1, 2, 3
/

ROAD_ID ROAD_DESC        SEC_START    SEC_END E_TYPE   GROUP_NO
------- --------------- ---------- ---------- ------ ----------
      1 The Strand               0         10 OPEN            0
                                10         20 (none)          1
                                20         45 OPEN            1
                                45         60 OTHER           3
                                60        100 OPEN            4
                               100        115 OTHER           5
                               115        160 CLOSED          9
                               160        200 (none)         11
      2 Piccadilly               0         10 (none)          0
                                10         30 CLOSED          1
                                30         40 (none)          1
                                40         70 OPEN            3
                                70        100 (none)          3

13 rows selected.

Query Structure Diagram (Contiguity by Differences)Road, V1.5 - QSD-DiffExecution Plan (Contiguity by Differences)

--------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name          | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |               |      1 |        |     13 |00:00:00.01 |      25 |       |       |          |
|   1 |  SORT ORDER BY                  |               |      1 |      2 |     13 |00:00:00.01 |      25 |  2048 |  2048 | 2048  (0)|
|   2 |   HASH GROUP BY                 |               |      1 |      2 |     13 |00:00:00.01 |      25 |   900K|   900K| 1343K (0)|
|   3 |    MERGE JOIN                   |               |      1 |     24 |     19 |00:00:00.01 |      25 |       |       |          |
|   4 |     TABLE ACCESS BY INDEX ROWID | ROADS         |      1 |      2 |      2 |00:00:00.01 |       2 |       |       |          |
|   5 |      INDEX FULL SCAN            | ROAD_PK       |      1 |      2 |      2 |00:00:00.01 |       1 |       |       |          |
|*  6 |     SORT JOIN                   |               |      2 |     24 |     19 |00:00:00.01 |      23 |  2048 |  2048 | 2048  (0)|
|   7 |      VIEW                       |               |      1 |     24 |     19 |00:00:00.01 |      23 |       |       |          |
|   8 |       WINDOW SORT               |               |      1 |     24 |     19 |00:00:00.01 |      23 |  2048 |  2048 | 2048  (0)|
|   9 |        WINDOW NOSORT            |               |      1 |     24 |     19 |00:00:00.01 |      23 | 73728 | 73728 |          |
|  10 |         SORT GROUP BY           |               |      1 |     24 |     19 |00:00:00.01 |      23 |  4096 |  4096 | 4096  (0)|
|* 11 |          HASH JOIN OUTER        |               |      1 |     24 |     25 |00:00:00.01 |      23 |  1696K|  1696K|  540K (0)|
|* 12 |           VIEW                  |               |      1 |     24 |     19 |00:00:00.01 |      16 |       |       |          |
|  13 |            WINDOW SORT          |               |      1 |     24 |     21 |00:00:00.01 |      16 |  2048 |  2048 | 2048  (0)|
|  14 |             VIEW                |               |      1 |     24 |     21 |00:00:00.01 |      16 |       |       |          |
|  15 |              SORT UNIQUE        |               |      1 |     24 |     21 |00:00:00.01 |      16 |  2048 |  2048 | 2048  (0)|
|  16 |               UNION-ALL         |               |      1 |        |     24 |00:00:00.01 |      16 |       |       |          |
|  17 |                INDEX FULL SCAN  | ROAD_EVENT_N1 |      1 |     10 |     10 |00:00:00.01 |       1 |       |       |          |
|  18 |                INDEX FULL SCAN  | ROAD_EVENT_N3 |      1 |     10 |     10 |00:00:00.01 |       1 |       |       |          |
|  19 |                TABLE ACCESS FULL| ROADS         |      1 |      2 |      2 |00:00:00.01 |       7 |       |       |          |
|  20 |                TABLE ACCESS FULL| ROADS         |      1 |      2 |      2 |00:00:00.01 |       7 |       |       |          |
|  21 |           TABLE ACCESS FULL     | ROAD_EVENTS   |      1 |     10 |     10 |00:00:00.01 |       7 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   6 - access("R"."ID"="L"."ROAD_ID")
       filter("R"."ID"="L"."ROAD_ID")
  11 - access("E"."ROAD_ID"="L"."ROAD_ID")
       filter(("E"."START_POINT"<="L"."LEG_START" AND "E"."END_POINT">="L"."LEG_END"))
  12 - filter("L"."LEG_END" IS NOT NULL)

SQL Solution for 'Continuum' Problem, Contiguity Grouping by 12c Row Pattern Matching

SQL (Contiguity by Pattern Matching)

WITH breaks AS  (
        SELECT road_id, start_point bp FROM road_events
         UNION
        SELECT road_id, end_point FROM road_events
         UNION
        SELECT id, start_point FROM roads
         UNION
        SELECT id, end_point FROM roads
), legs AS (
        SELECT road_id, bp leg_start, Lead (bp) OVER (PARTITION BY road_id ORDER BY bp) leg_end
          FROM breaks
), latest_events AS ( 
        SELECT l.road_id, r.road_desc, l.leg_start, l.leg_end,
               Max (e.id) KEEP (DENSE_RANK LAST ORDER BY e.event_date) event_id,
               Nvl (Max (e.event_type) KEEP (DENSE_RANK LAST ORDER BY e.event_date), '(none)') event_type
          FROM legs l
          JOIN roads r
            ON r.id = l.road_id
          LEFT JOIN road_events e
            ON e.road_id = l.road_id
           AND e.start_point <= l.leg_start
	   AND e.end_point >= l.leg_end
         WHERE l.leg_end IS NOT NULL
         GROUP BY l.road_id, r.road_desc, l.leg_start, l.leg_end
)
SELECT m.road_id, m.road_desc, m.sec_start, m.sec_end, m.event_type e_type
  FROM latest_events
 MATCH_RECOGNIZE (
   PARTITION BY road_id, road_desc
   ORDER BY leg_start, leg_end
   MEASURES FIRST (leg_start) sec_start,
            LAST (leg_end) sec_end,
            LAST (event_type) event_type
   PATTERN (strt sm*)
   DEFINE sm AS PREV(sm.event_type) = sm.event_type
 ) m
ORDER BY 1, 2, 3
/

ROAD_ID ROAD_DESC        SEC_START    SEC_END E_TYPE
------- --------------- ---------- ---------- ------
      1 The Strand               0         10 OPEN
                                10         20 (none)
                                20         45 OPEN
                                45         60 OTHER
                                60        100 OPEN
                               100        115 OTHER
                               115        160 CLOSED
                               160        200 (none)
      2 Piccadilly               0         10 (none)
                                10         30 CLOSED
                                30         40 (none)
                                40         70 OPEN
                                70        100 (none)

13 rows selected.

Query Structure Diagram (Contiguity by Pattern Matching)Road, V1.5 - QSD-MRExecution Plan (Contiguity by Pattern Matching)

-------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                        | Name          | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                                 |               |      1 |        |     13 |00:00:00.01 |      25 |       |       |          |
|   1 |  SORT ORDER BY                                   |               |      1 |      2 |     13 |00:00:00.01 |      25 |  2048 |  2048 | 2048  (0)|
|   2 |   VIEW                                           |               |      1 |      2 |     13 |00:00:00.01 |      25 |       |       |          |
|   3 |    MATCH RECOGNIZE SORT DETERMINISTIC FINITE AUTO|               |      1 |      2 |     13 |00:00:00.01 |      25 |  2048 |  2048 | 2048  (0)|
|   4 |     VIEW                                         |               |      1 |      2 |     19 |00:00:00.01 |      25 |       |       |          |
|   5 |      SORT GROUP BY                               |               |      1 |      2 |     19 |00:00:00.01 |      25 |  4096 |  4096 | 4096  (0)|
|*  6 |       HASH JOIN OUTER                            |               |      1 |     24 |     25 |00:00:00.01 |      25 |   987K|   987K|  525K (0)|
|   7 |        MERGE JOIN                                |               |      1 |     24 |     19 |00:00:00.01 |      18 |       |       |          |
|   8 |         TABLE ACCESS BY INDEX ROWID              | ROADS         |      1 |      2 |      2 |00:00:00.01 |       2 |       |       |          |
|   9 |          INDEX FULL SCAN                         | ROAD_PK       |      1 |      2 |      2 |00:00:00.01 |       1 |       |       |          |
|* 10 |         SORT JOIN                                |               |      2 |     24 |     19 |00:00:00.01 |      16 |  2048 |  2048 | 2048  (0)|
|* 11 |          VIEW                                    |               |      1 |     24 |     19 |00:00:00.01 |      16 |       |       |          |
|  12 |           WINDOW SORT                            |               |      1 |     24 |     21 |00:00:00.01 |      16 |  2048 |  2048 | 2048  (0)|
|  13 |            VIEW                                  |               |      1 |     24 |     21 |00:00:00.01 |      16 |       |       |          |
|  14 |             SORT UNIQUE                          |               |      1 |     24 |     21 |00:00:00.01 |      16 |  2048 |  2048 | 2048  (0)|
|  15 |              UNION-ALL                           |               |      1 |        |     24 |00:00:00.01 |      16 |       |       |          |
|  16 |               INDEX FULL SCAN                    | ROAD_EVENT_N1 |      1 |     10 |     10 |00:00:00.01 |       1 |       |       |          |
|  17 |               INDEX FULL SCAN                    | ROAD_EVENT_N3 |      1 |     10 |     10 |00:00:00.01 |       1 |       |       |          |
|  18 |               TABLE ACCESS FULL                  | ROADS         |      1 |      2 |      2 |00:00:00.01 |       7 |       |       |          |
|  19 |               TABLE ACCESS FULL                  | ROADS         |      1 |      2 |      2 |00:00:00.01 |       7 |       |       |          |
|  20 |        TABLE ACCESS FULL                         | ROAD_EVENTS   |      1 |     10 |     10 |00:00:00.01 |       7 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   6 - access("E"."ROAD_ID"="L"."ROAD_ID")
       filter(("E"."START_POINT"<="L"."LEG_START" AND "E"."END_POINT">="L"."LEG_END"))
  10 - access("R"."ID"="L"."ROAD_ID")
       filter("R"."ID"="L"."ROAD_ID")
  11 - filter("L"."LEG_END" IS NOT NULL)

Note
-----
   - this is an adaptive plan

SQL Solution for 'Continuum' Problem, Contiguity Grouping - Postgres

The Oracle v11 (and earlier versions) solution, with contiguity by differences, can be converted to work in Postgres, as shown below.

SQL (Continuum by Row_Number, Contiguity by Differences - Postgres)

WITH breaks AS  (
        SELECT road_id, start_point bp FROM road_events
         UNION
        SELECT road_id, end_point FROM road_events
         UNION
        SELECT id, start_point FROM roads
         UNION
        SELECT id, end_point FROM roads
), legs AS (
        SELECT road_id, bp leg_start, Lead (bp) OVER (PARTITION BY road_id ORDER BY bp) leg_end
          FROM breaks
), ranked_events AS ( 
        SELECT l.road_id, l.leg_start, l.leg_end,
               e.id event_id, Coalesce (e.event_type, '(none)') event_type,
               Row_Number() OVER (PARTITION BY l.road_id, l.leg_start ORDER BY e.event_date DESC) rnk
          FROM legs l
          LEFT JOIN road_events e
            ON e.road_id = l.road_id
           AND e.start_point <= l.leg_start            AND e.end_point >= l.leg_end
         WHERE l.leg_end IS NOT NULL
), latest_events_group AS ( 
        SELECT road_id,
               leg_start,
               leg_end,
               event_id,
               event_type,
               Dense_Rank () OVER (PARTITION BY road_id ORDER BY leg_start, leg_end) -
               Dense_Rank () OVER (PARTITION BY road_id, event_type ORDER BY leg_start, leg_end) group_no
          FROM ranked_events
         WHERE rnk = 1
)
SELECT l.road_id, r.road_desc,
       Min (l.leg_start)        sec_start,
       Max (l.leg_end)          sec_end,
       l.event_type             e_type,
       l.group_no
  FROM latest_events_group l
  JOIN roads r
    ON r.id = l.road_id
 GROUP BY l.road_id,
        r.road_desc, 
        l.event_type,
        l.group_no
ORDER BY 1, 2, 3;

Continuum/contiguity Solution with Row_Number...
 road_id | road_desc  | sec_start | sec_end | e_type | group_no 
---------+------------+-----------+---------+--------+----------
       1 | The Strand |         0 |      10 | OPEN   |        0
       1 | The Strand |        10 |      20 | (none) |        1
       1 | The Strand |        20 |      45 | OPEN   |        1
       1 | The Strand |        45 |      60 | OTHER  |        3
       1 | The Strand |        60 |     100 | OPEN   |        4
       1 | The Strand |       100 |     115 | OTHER  |        5
       1 | The Strand |       115 |     160 | CLOSED |        9
       1 | The Strand |       160 |     200 | (none) |       11
       2 | Piccadilly |         0 |      10 | (none) |        0
       2 | Piccadilly |        10 |      30 | CLOSED |        1
       2 | Piccadilly |        30 |      40 | (none) |        1
       2 | Piccadilly |        40 |      70 | OPEN   |        3
       2 | Piccadilly |        70 |     100 | (none) |        3
(13 rows)

SELECT Version();

Postgres version...
                           version                           
-------------------------------------------------------------
 PostgreSQL 9.3.1, compiled by Visual C++ build 1600, 64-bit
(1 row)

Notes on Conversion of SQL to Postgres

  • Postgres does not have an exact equivalent of Oracle's KEEP/FIRST grouping functionality, but it can be emulated via the analytic function Row_Number within a subquery
  • Coalesce, which is also available in Oracle, replaces Nvl, which does not exist in Postgres
  • Oracle's execution plans were obtained using DBMS_XPlan after running the queries, while the Postgres version was obtained by running the query prefaced by 'EXPLAIN ANALYZE '
  • There is no Postgres equivalent of Oracle v12's row pattern matching

Query Structure Diagram (Continuum by Row_Number, Contiguity by Differences - Postgres) Road, V1.5 - QSD-PGExecution Plan (Continuum by Row_Number, Contiguity by Differences - Postgres)

Prefacing the query with 'EXPLAIN ANALYZE ' gives:

Explaining Continuum/contiguity Solution with Row_Number...
                                                                 QUERY PLAN                                                                  
---------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=619.45..619.47 rows=8 width=270) (actual time=0.235..0.235 rows=13 loops=1)
   Sort Key: l.road_id, r.road_desc, (min(l.leg_start))
   Sort Method: quicksort  Memory: 26kB
   CTE breaks
     ->  HashAggregate  (cost=79.50..95.30 rows=1580 width=8) (actual time=0.013..0.020 rows=21 loops=1)
           ->  Append  (cost=0.00..71.60 rows=1580 width=8) (actual time=0.002..0.006 rows=24 loops=1)
                 ->  Seq Scan on road_events  (cost=0.00..14.80 rows=480 width=8) (actual time=0.001..0.003 rows=10 loops=1)
                 ->  Seq Scan on road_events road_events_1  (cost=0.00..14.80 rows=480 width=8) (actual time=0.000..0.002 rows=10 loops=1)
                 ->  Seq Scan on roads  (cost=0.00..13.10 rows=310 width=8) (actual time=0.000..0.001 rows=2 loops=1)
                 ->  Seq Scan on roads roads_1  (cost=0.00..13.10 rows=310 width=8) (actual time=0.000..0.000 rows=2 loops=1)
   CTE legs
     ->  WindowAgg  (cost=115.54..147.14 rows=1580 width=8) (actual time=0.030..0.041 rows=21 loops=1)
           ->  Sort  (cost=115.54..119.49 rows=1580 width=8) (actual time=0.028..0.029 rows=21 loops=1)
                 Sort Key: breaks.road_id, breaks.bp
                 Sort Method: quicksort  Memory: 25kB
                 ->  CTE Scan on breaks  (cost=0.00..31.60 rows=1580 width=8) (actual time=0.014..0.023 rows=21 loops=1)
   CTE ranked_events
     ->  WindowAgg  (cost=290.71..326.08 rows=1572 width=138) (actual time=0.089..0.104 rows=25 loops=1)
           ->  Sort  (cost=290.71..294.64 rows=1572 width=138) (actual time=0.088..0.089 rows=25 loops=1)
                 Sort Key: l_1.road_id, l_1.leg_start, e.event_date
                 Sort Method: quicksort  Memory: 26kB
                 ->  Hash Left Join  (cost=20.80..207.25 rows=1572 width=138) (actual time=0.044..0.079 rows=25 loops=1)
                       Hash Cond: (l_1.road_id = e.road_id)
                       Join Filter: ((e.start_point <= l_1.leg_start) AND (e.end_point >= l_1.leg_end))
                       Rows Removed by Join Filter: 89
                       ->  CTE Scan on legs l_1  (cost=0.00..31.60 rows=1572 width=12) (actual time=0.031..0.048 rows=19 loops=1)
                             Filter: (leg_end IS NOT NULL)
                             Rows Removed by Filter: 2
                       ->  Hash  (cost=14.80..14.80 rows=480 width=138) (actual time=0.005..0.005 rows=10 loops=1)
                             Buckets: 1024  Batches: 1  Memory Usage: 1kB
                             ->  Seq Scan on road_events e  (cost=0.00..14.80 rows=480 width=138) (actual time=0.001..0.002 rows=10 loops=1)
   CTE latest_events_group
     ->  WindowAgg  (cost=35.79..36.01 rows=8 width=48) (actual time=0.165..0.181 rows=19 loops=1)
           ->  Sort  (cost=35.79..35.81 rows=8 width=48) (actual time=0.164..0.165 rows=19 loops=1)
                 Sort Key: ranked_events.road_id, ranked_events.event_type, ranked_events.leg_start, ranked_events.leg_end
                 Sort Method: quicksort  Memory: 26kB
                 ->  WindowAgg  (cost=35.49..35.67 rows=8 width=48) (actual time=0.124..0.138 rows=19 loops=1)
                       ->  Sort  (cost=35.49..35.51 rows=8 width=48) (actual time=0.123..0.124 rows=19 loops=1)
                             Sort Key: ranked_events.road_id, ranked_events.leg_start, ranked_events.leg_end
                             Sort Method: quicksort  Memory: 26kB
                             ->  CTE Scan on ranked_events  (cost=0.00..35.37 rows=8 width=48) (actual time=0.090..0.117 rows=19 loops=1)
                                   Filter: (rnk = 1)
                                   Rows Removed by Filter: 6
   ->  HashAggregate  (cost=14.72..14.80 rows=8 width=270) (actual time=0.218..0.221 rows=13 loops=1)
         ->  Hash Join  (cost=0.26..14.60 rows=8 width=270) (actual time=0.203..0.207 rows=19 loops=1)
               Hash Cond: (r.id = l.road_id)
               ->  Seq Scan on roads r  (cost=0.00..13.10 rows=310 width=222) (actual time=0.002..0.002 rows=2 loops=1)
               ->  Hash  (cost=0.16..0.16 rows=8 width=52) (actual time=0.192..0.192 rows=19 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 2kB
                     ->  CTE Scan on latest_events_group l  (cost=0.00..0.16 rows=8 width=52) (actual time=0.167..0.190 rows=19 loops=1)
 Total runtime: 0.432 ms
(51 rows)

Code

Here is the code: Road-SQL






Messages from Outer Space

 

I began my career in IT in 1984, before the internet and even before widespread email availability. In those days one learnt about technology and other subjects largely through books and papers, training courses, and one's immediate colleagues. Since then the rise of the internet has transformed the learning process, vastly expanding the material available.

In its earlier years users could be roughly divided into a large group of consumers and a much smaller group of producers; later, and increasingly in recent years, the emergence of what is sometimes called 'Web 2' has enabled many consumers to become producers as well. As Wikipedia (itself a prime example of the phenomenon), Web 2.0, puts it:

"A Web 2.0 site may allow users to interact and collaborate with each other in a social media dialogue as creators of user-generated content in a virtual community..."

For IT developers and other technical people this is hugely important because it expands the pool of available expertise in one's field from a few colleagues and a relatively small group of publishers to potentially everyone working in the field. This means that good ideas and best practices quickly become available to all. Where in the past bad practices or antipatterns could become entrenched in a company through the influence of a small number of people sharing a bad idea, today we can check what the rest of the world thinks. It's a bit like the old scifi motif in which an advanced civilisation comes to share its ideas with Planet Earth :).

et

However, great though this is, it's necessary to make the effort to keep up with best practice in one's own field, and not all developers do that. In my main field, Oracle database development, the resources I find most useful are:

As well as keeping up with the general Oracle community, developers should ideally run their own blog to maximise learning.

The remainder of this article consists of useful links concerning best (or worst) practices that I deem significant in database development, largely in areas where people commonly get it wrong. This article will be an ongoing work in progress.

Database Design

Database Design and the Leaning Tower of Pisa

This article by Rodger Lepinsky notes that database table designs are usually done very quickly and tend not to be re-worked as the application is developed to avoid impacting developers. The problem is that this apparently justifiable reluctance to re-work often results in much additional code to work around inevitable deficiencies in the initial data model, and ultimately a much more complex system: It's short-sighted in other words.

Here's a nice cartoon illustrating a common database design antipattern taken to its logical conclusion :)

The EAV Data Model

Design Consistency

Rodger refers to database designs becoming frozen too early in the article mentioned in the previous section. I believe this is a design antipattern that occurs very widely in general. Often a new technology is applied for a new project, one that the developers may not know very well at first; similarly, custom frameworks are often developed, in Unix or Perl, or whatever, and the initial production versions are rarely perfect. However, once something works, the tendency is to freeze it, with all its limitations, for fear of unforeseen impacts, or even purely from the notion that consistency is more important than quality.

I think that this excessive emphasis on 'consistency' is one of the main reasons that so many over-complex, poor quality systems persist indefinitely. It might be better to focus on 'consistently' applying best practice as currently understood.

Data Access Layers

Considering SQL as a Service

The idea of 'modularising' SQL through data access layers is one that comes up frequently on Oracle forums, including the Ask Tom thread above, and is well known to be an antipattern. I deal with it in a wider context here:

SQL and Modularity: Patterns, Anti-Patterns and the Kitchen Sink

Here is another relevant AskTom thread:

Multi-Level Views

"I prefer a SINGLE LEVEL of views. Yes, there will be some repetition in their definition but I don't care about that. You can document that. You can maintain that"

ETL vs ELT

ETL - Using the wrong tool for the job

Companies often use a mix of ETL and ELT with Informatica and Oracle, among other tools. Here is Tom Kyte's pithy summation:

"elt = extract, load and then transform (forget any tools, if you have hundreds of gb's or tb's of data - you'll be doing this down to the wire, not with pretty pictures and push buttons)

etl = extract, transform and then load - without using the database to transform

elt = going light speed

etl = going by boat"

Object Relational Madness (ORM)

Arguments against using an ORM layer - an ammunition stockpile

Brilliant article with collation of links on the subject, including one that compares ORM to the Vietnam war (perhaps understating the case against) :)