SQL for Length-Controlled List Aggregation

Recently an OTN poster asked how to return values from a column in multiple records into a single output record and column in SQL, Multiple Rows Into One Column Field. In the usual version of this list aggregation problem, one record is required for each distinct combination of grouping columns, with the aggregation fields delimited within the output field, and from Oracle v11.2 there is a built-in function for it, ListAgg. However, in this case the poster wanted a maximum of three values in the output field, with overflow records as necessary. Tom Kyte solved the problem in that thread essentially by adding in a calculated row number to the grouping columns, and concatenating the aggregation fields directly in the code.

Tim Hall has compiled a list of the main techniques available for the standard problem for different versions of the database, up to v11.2, here: String Aggregation Techniques

I thought it would be an interesting and useful variation on the problem to base record overflow on concatenated length rather than on number of values. This would provide an alternative to the CLOB-based variations for cases where the length exceeds 4KB in v11.2 or earlier (v12.1 raises the string limit to 32KB), and cases where the records just need to be of limited length for display or other purposes. Here’s an example on another forum of a requirement to handle long strings: Ordering by list of strings in Oracle SQL without LISTAGG.

On the OTN thread, I provided an SQL solution for this variation, using recursion and the new v12.1 MATCH_RECOGNIZE syntax for row pattern matching, and an alternative using the MODEL clause. In this article I provide modified versions with explanations and execution plans.

SQL Analytics and Recursion

The problem variation as I have defined it is harder than it may at first appear, and in fact, can’t be solved by SQL grouping and analytics alone: Recursion is required. The reason is that when considering whether a source record needs to start a new grouping record, or can be included on the prior record, the answer depends on the lengths of the fields of an unknown number of prior records. Analytic functions can only sum over a known number of prior records, and, although ‘known’ includes values that can be computed via a prior subquery, this is not possible here.

The approach we take involves two logical steps. In the first step, the records are processed in sequence within the partitions, and aggregate strings are accumulated for each record. When the string would exceed the maximum length a new aggregate is started and a flag is set.

Now, from the first step we have the input source number of records, with the desired aggregate strings being on the last records before any new aggregate, marked by the flag. We then just need to filter out the intermediate records.

Test Example

We will use Oracle’s standard HR schema for test data, and will aggregate employee names by department with the name list field having maximum length 80. There are 107 employees, and the desired output is:

      DEPT ENAME_LIST
---------- --------------------------------------------------------------------------------
        10 Whalen, Jennifer
        20 Fay, Pat; Hartstein, Michael
        30 Baida, Shelli; Colmenares, Karen; Himuro, Guy; Khoo, Alexander; Raphaely, Den
        30 Tobias, Sigal
        40 Mavris, Susan
        50 Atkinson, Mozhe; Bell, Sarah; Bissot, Laura; Bull, Alexis; Cabrio, Anthony
        50 Chung, Kelly; Davies, Curtis; Dellinger, Julia; Dilly, Jennifer
        50 Everett, Britney; Feeney, Kevin; Fleaur, Jean; Fripp, Adam; Gates, Timothy
        50 Gee, Ki; Geoni, Girard; Grant, Douglas; Jones, Vance; Kaufling, Payam
        50 Ladwig, Renske; Landry, James; Mallin, Jason; Markle, Steven; Marlow, James
        50 Matos, Randall; McCain, Samuel; Mikkilineni, Irene; Mourgos, Kevin; Nayer, Julia
        50 OConnell, Donald; Olson, TJ; Patel, Joshua; Perkins, Randall; Philtanker, Hazel
        50 Rajs, Trenna; Rogers, Michael; Sarchand, Nandita; Seo, John; Stiles, Stephen
        50 Sullivan, Martha; Taylor, Winston; Vargas, Peter; Vollman, Shanta; Walsh, Alana
        50 Weiss, Matthew
        60 Austin, David; Ernst, Bruce; Hunold, Alexander; Lorentz, Diana; Pataballa, Valli
        70 Baer, Hermann
        80 Abel, Ellen; Ande, Sundar; Banda, Amit; Bates, Elizabeth; Bernstein, David
        80 Bloom, Harrison; Cambrault, Gerald; Cambrault, Nanette; Doran, Louise
        80 Errazuriz, Alberto; Fox, Tayler; Greene, Danielle; Hall, Peter; Hutton, Alyssa
        80 Johnson, Charles; King, Janette; Kumar, Sundita; Lee, David; Livingston, Jack
        80 Marvins, Mattea; McEwen, Allan; Olsen, Christopher; Ozer, Lisa; Partners, Karen
        80 Russell, John; Sewall, Sarath; Smith, Lindsey; Smith, William; Sully, Patrick
        80 Taylor, Jonathon; Tucker, Peter; Tuvault, Oliver; Vishney, Clara; Zlotkey, Eleni
        90 De Haan, Lex; King, Steven; Kochhar, Neena
       100 Chen, John; Faviet, Daniel; Greenberg, Nancy; Popp, Luis; Sciarra, Ismael
       100 Urman, Jose Manuel
       110 Gietz, William; Higgins, Shelley
           Grant, Kimberely

29 rows selected.

Recursive Subquery Factor Solutions

Recursive subquery factors are available from Oracle v11.2 up, and can be used to implement the required recursion. In this solution the flag denoting an overspill line has to be set initially on that overspill line, rather than on the preceding line, and on the last line in the partition, which are the lines we want to display. We therefore need another step.

Setting print flag via analytic function

In v11.2, an additional subquery can be added that uses the analytic function Lead to set a flag on the required lines. This works by looking at the previously set flag on the next record (if the record exists), and setting the desired line print flag accordingly on the current record. The query is:

WITH emps_ordered AS (
SELECT department_id dept,
       CAST (last_name || ', ' || first_name AS VARCHAR2(4000)) ename,
       Row_Number () OVER (PARTITION BY department_id ORDER BY last_name, first_name) rn
  FROM hr.employees
), rsf (dept, ename_list, new_line, rn) AS (
SELECT dept, ename, 1, rn
  FROM emps_ordered
 WHERE rn = 1
 UNION ALL
SELECT r.dept,
       CASE WHEN Length (r.ename_list || '; ' || e.ename) > :rec_len THEN e.ename ELSE r.ename_list || '; ' || e.ename END,
       CASE WHEN Length (r.ename_list || '; ' || e.ename) > :rec_len THEN 1 ELSE 0 END,
       r.rn + 1
  FROM rsf r
  JOIN emps_ordered e
    ON e.dept = r.dept
   AND e.rn = r.rn + 1
), leads_v AS (
SELECT dept, ename_list, 
       new_line, Lead (new_line, 1, 1) OVER (PARTITION BY dept ORDER BY rn) line_print, rn
  FROM rsf
)
SELECT *
  FROM leads_v
 WHERE line_print = 1
 ORDER BY dept, ename_list

How it works

  • emps_ordered subquery: Formats the name field and gets a row number within department ordering by the name
  • rsf recursive subquery: Anchor branch selects first employee in each department; recursive branch joins the next employee based on the row number, and accumulates the name list, resetting and flagging when length dictates a new overspill line
  • leads_v subquery: Use Lead analytic function to set the print flag
  • Main query: Selects rows where the print flag = 1

The output from the leads_v subquery, before filtering, illustrates how it works:

DEPT ENAME_LIST                                                                       NEW_LINE LINE_PRINT  RN
---- -------------------------------------------------------------------------------- -------- ---------- ---
  10 Whalen, Jennifer                                                                        1          1   1
  20 Fay, Pat                                                                                1          0   1
  20 Fay, Pat; Hartstein, Michael                                                            0          1   2
  30 Baida, Shelli                                                                           1          0   1
  30 Baida, Shelli; Colmenares, Karen                                                        0          0   2
  30 Baida, Shelli; Colmenares, Karen; Himuro, Guy                                           0          0   3
  30 Baida, Shelli; Colmenares, Karen; Himuro, Guy; Khoo, Alexander                          0          0   4
  30 Baida, Shelli; Colmenares, Karen; Himuro, Guy; Khoo, Alexander; Raphaely, Den           0          1   5
  30 Tobias, Sigal                                                                           1          1   6
  40 Mavris, Susan                                                                           1          1   1
  50 Atkinson, Mozhe                                                                         1          0   1
  50 Atkinson, Mozhe; Bell, Sarah                                                            0          0   2
  50 Atkinson, Mozhe; Bell, Sarah; Bissot, Laura                                             0          0   3
  50 Atkinson, Mozhe; Bell, Sarah; Bissot, Laura; Bull, Alexis                               0          0   4
  50 Atkinson, Mozhe; Bell, Sarah; Bissot, Laura; Bull, Alexis; Cabrio, Anthony              0          1   5
  50 Chung, Kelly                                                                            1          0   6
  50 Chung, Kelly; Davies, Curtis                                                            0          0   7
  50 Chung, Kelly; Davies, Curtis; Dellinger, Julia                                          0          0   8
  50 Chung, Kelly; Davies, Curtis; Dellinger, Julia; Dilly, Jennifer                         0          1   9
  50 Everett, Britney                                                                        1          0  10
  50 Everett, Britney; Feeney, Kevin                                                         0          0  11
  50 Everett, Britney; Feeney, Kevin; Fleaur, Jean                                           0          0  12
  50 Everett, Britney; Feeney, Kevin; Fleaur, Jean; Fripp, Adam                              0          0  13
  50 Everett, Britney; Feeney, Kevin; Fleaur, Jean; Fripp, Adam; Gates, Timothy              0          1  14
  50 Gee, Ki                                                                                 1          0  15
  50 Gee, Ki; Geoni, Girard                                                                  0          0  16
  50 Gee, Ki; Geoni, Girard; Grant, Douglas                                                  0          0  17
  50 Gee, Ki; Geoni, Girard; Grant, Douglas; Jones, Vance                                    0          0  18
  50 Gee, Ki; Geoni, Girard; Grant, Douglas; Jones, Vance; Kaufling, Payam                   0          1  19
  50 Ladwig, Renske                                                                          1          0  20
  50 Ladwig, Renske; Landry, James                                                           0          0  21
  50 Ladwig, Renske; Landry, James; Mallin, Jason                                            0          0  22
  50 Ladwig, Renske; Landry, James; Mallin, Jason; Markle, Steven                            0          0  23
  50 Ladwig, Renske; Landry, James; Mallin, Jason; Markle, Steven; Marlow, James             0          1  24
  50 Matos, Randall                                                                          1          0  25
  50 Matos, Randall; McCain, Samuel                                                          0          0  26
  50 Matos, Randall; McCain, Samuel; Mikkilineni, Irene                                      0          0  27
  50 Matos, Randall; McCain, Samuel; Mikkilineni, Irene; Mourgos, Kevin                      0          0  28
  50 Matos, Randall; McCain, Samuel; Mikkilineni, Irene; Mourgos, Kevin; Nayer, Julia        0          1  29
  50 OConnell, Donald                                                                        1          0  30
  50 OConnell, Donald; Olson, TJ                                                             0          0  31
  50 OConnell, Donald; Olson, TJ; Patel, Joshua                                              0          0  32
  50 OConnell, Donald; Olson, TJ; Patel, Joshua; Perkins, Randall                            0          0  33
  50 OConnell, Donald; Olson, TJ; Patel, Joshua; Perkins, Randall; Philtanker, Hazel         0          1  34
  50 Rajs, Trenna                                                                            1          0  35
  50 Rajs, Trenna; Rogers, Michael                                                           0          0  36
  50 Rajs, Trenna; Rogers, Michael; Sarchand, Nandita                                        0          0  37
  50 Rajs, Trenna; Rogers, Michael; Sarchand, Nandita; Seo, John                             0          0  38
  50 Rajs, Trenna; Rogers, Michael; Sarchand, Nandita; Seo, John; Stiles, Stephen            0          1  39
  50 Sullivan, Martha                                                                        1          0  40
  50 Sullivan, Martha; Taylor, Winston                                                       0          0  41
  50 Sullivan, Martha; Taylor, Winston; Vargas, Peter                                        0          0  42
  50 Sullivan, Martha; Taylor, Winston; Vargas, Peter; Vollman, Shanta                       0          0  43
  50 Sullivan, Martha; Taylor, Winston; Vargas, Peter; Vollman, Shanta; Walsh, Alana         0          1  44
  50 Weiss, Matthew                                                                          1          1  45
  60 Austin, David                                                                           1          0   1
  60 Austin, David; Ernst, Bruce                                                             0          0   2
  60 Austin, David; Ernst, Bruce; Hunold, Alexander                                          0          0   3
  60 Austin, David; Ernst, Bruce; Hunold, Alexander; Lorentz, Diana                          0          0   4
  60 Austin, David; Ernst, Bruce; Hunold, Alexander; Lorentz, Diana; Pataballa, Valli        0          1   5
  70 Baer, Hermann                                                                           1          1   1
  80 Abel, Ellen                                                                             1          0   1
  80 Abel, Ellen; Ande, Sundar                                                               0          0   2
  80 Abel, Ellen; Ande, Sundar; Banda, Amit                                                  0          0   3
  80 Abel, Ellen; Ande, Sundar; Banda, Amit; Bates, Elizabeth                                0          0   4
  80 Abel, Ellen; Ande, Sundar; Banda, Amit; Bates, Elizabeth; Bernstein, David              0          1   5
  80 Bloom, Harrison                                                                         1          0   6
  80 Bloom, Harrison; Cambrault, Gerald                                                      0          0   7
  80 Bloom, Harrison; Cambrault, Gerald; Cambrault, Nanette                                  0          0   8
  80 Bloom, Harrison; Cambrault, Gerald; Cambrault, Nanette; Doran, Louise                   0          1   9
  80 Errazuriz, Alberto                                                                      1          0  10
  80 Errazuriz, Alberto; Fox, Tayler                                                         0          0  11
  80 Errazuriz, Alberto; Fox, Tayler; Greene, Danielle                                       0          0  12
  80 Errazuriz, Alberto; Fox, Tayler; Greene, Danielle; Hall, Peter                          0          0  13
  80 Errazuriz, Alberto; Fox, Tayler; Greene, Danielle; Hall, Peter; Hutton, Alyssa          0          1  14
  80 Johnson, Charles                                                                        1          0  15
  80 Johnson, Charles; King, Janette                                                         0          0  16
  80 Johnson, Charles; King, Janette; Kumar, Sundita                                         0          0  17
  80 Johnson, Charles; King, Janette; Kumar, Sundita; Lee, David                             0          0  18
  80 Johnson, Charles; King, Janette; Kumar, Sundita; Lee, David; Livingston, Jack           0          1  19
  80 Marvins, Mattea                                                                         1          0  20
  80 Marvins, Mattea; McEwen, Allan                                                          0          0  21
  80 Marvins, Mattea; McEwen, Allan; Olsen, Christopher                                      0          0  22
  80 Marvins, Mattea; McEwen, Allan; Olsen, Christopher; Ozer, Lisa                          0          0  23
  80 Marvins, Mattea; McEwen, Allan; Olsen, Christopher; Ozer, Lisa; Partners, Karen         0          1  24
  80 Russell, John                                                                           1          0  25
  80 Russell, John; Sewall, Sarath                                                           0          0  26
  80 Russell, John; Sewall, Sarath; Smith, Lindsey                                           0          0  27
  80 Russell, John; Sewall, Sarath; Smith, Lindsey; Smith, William                           0          0  28
  80 Russell, John; Sewall, Sarath; Smith, Lindsey; Smith, William; Sully, Patrick           0          1  29
  80 Taylor, Jonathon                                                                        1          0  30
  80 Taylor, Jonathon; Tucker, Peter                                                         0          0  31
  80 Taylor, Jonathon; Tucker, Peter; Tuvault, Oliver                                        0          0  32
  80 Taylor, Jonathon; Tucker, Peter; Tuvault, Oliver; Vishney, Clara                        0          0  33
  80 Taylor, Jonathon; Tucker, Peter; Tuvault, Oliver; Vishney, Clara; Zlotkey, Eleni        0          1  34
  90 De Haan, Lex                                                                            1          0   1
  90 De Haan, Lex; King, Steven                                                              0          0   2
  90 De Haan, Lex; King, Steven; Kochhar, Neena                                              0          1   3
 100 Chen, John                                                                              1          0   1
 100 Chen, John; Faviet, Daniel                                                              0          0   2
 100 Chen, John; Faviet, Daniel; Greenberg, Nancy                                            0          0   3
 100 Chen, John; Faviet, Daniel; Greenberg, Nancy; Popp, Luis                                0          0   4
 100 Chen, John; Faviet, Daniel; Greenberg, Nancy; Popp, Luis; Sciarra, Ismael               0          1   5
 100 Urman, Jose Manuel                                                                      1          1   6
 110 Gietz, William                                                                          1          0   1
 110 Gietz, William; Higgins, Shelley                                                        0          1   2
     Grant, Kimberely                                                                        1          1   1

107 rows selected.

Execution Plan (with filtering)

-----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |           |      1 |        |     29 |00:00:00.01 |     322 |       |       |          |
|   1 |  SORT ORDER BY                               |           |      1 |    117 |     29 |00:00:00.01 |     322 |  9216 |  9216 | 8192  (0)|
|*  2 |   VIEW                                       |           |      1 |    117 |     29 |00:00:00.01 |     322 |       |       |          |
|   3 |    WINDOW SORT                               |           |      1 |    117 |    107 |00:00:00.01 |     322 | 13312 | 13312 |12288  (0)|
|   4 |     VIEW                                     |           |      1 |    117 |    107 |00:00:00.01 |     322 |       |       |          |
|   5 |      UNION ALL (RECURSIVE WITH) BREADTH FIRST|           |      1 |        |    107 |00:00:00.01 |     322 |  2048 |  2048 | 2048  (0)|
|*  6 |       VIEW                                   |           |      1 |    107 |     12 |00:00:00.01 |       7 |       |       |          |
|*  7 |        WINDOW SORT PUSHED RANK               |           |      1 |    107 |     12 |00:00:00.01 |       7 |  9216 |  9216 | 8192  (0)|
|   8 |         TABLE ACCESS FULL                    | EMPLOYEES |      1 |    107 |    107 |00:00:00.01 |       7 |       |       |          |
|*  9 |       HASH JOIN                              |           |     45 |     10 |     95 |00:00:00.01 |     315 |  1160K|  1160K| 1168K (0)|
|  10 |        RECURSIVE WITH PUMP                   |           |     45 |        |    107 |00:00:00.01 |       0 |       |       |          |
|  11 |        VIEW                                  |           |     45 |    107 |   4815 |00:00:00.01 |     315 |       |       |          |
|  12 |         WINDOW SORT                          |           |     45 |    107 |   4815 |00:00:00.01 |     315 | 18432 | 18432 |16384  (0)|
|  13 |          TABLE ACCESS FULL                   | EMPLOYEES |     45 |    107 |   4815 |00:00:00.01 |     315 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("LINE_PRINT"=1)
   6 - filter("RN"=1)
   7 - filter(ROW_NUMBER() OVER ( PARTITION BY "DEPARTMENT_ID" ORDER BY "LAST_NAME","FIRST_NAME")<=1)
   9 - access("E"."DEPT"="R"."DEPT" AND "E"."RN"="R"."RN"+1)

Note that the employees table is accessed 46 times, indicating an obvious performance issue.

Using MATCH_RECOGNIZE for filtering

In Oracle v12.1 row pattern matching was introduced, with the new keyword MATCH_RECOGNIZE, and this provides an alternative where the filtering does not require an additional subquery. The query is:

WITH emp_ordered AS (
SELECT department_id dept,
       Row_Number() OVER (PARTITION BY department_id ORDER BY last_name || ', ' || first_name) rn, 
       last_name || ', ' || first_name ename
  FROM hr.employees e
), emp_rsf (dept, rn, ename_list, new_line) AS (
SELECT dept, rn, ename, 1
  FROM emp_ordered
 WHERE rn = 1
UNION ALL
SELECT r.dept, 
       e.rn,
       CASE WHEN Length (r.ename_list || '; ' || e.ename) > :rec_len THEN e.ename
            ELSE r.ename_list || '; ' || e.ename END,
       CASE WHEN Length (r.ename_list || '; ' || e.ename) > :rec_len THEN 1
            ELSE 0 END
  FROM emp_rsf r
  JOIN emp_ordered e
    ON e.dept = r.dept
   AND e.rn = r.rn + 1
)
SELECT dept, ename_list, cls, mtc
  FROM emp_rsf
 MATCH_RECOGNIZE (
 PARTITION BY dept
 ORDER BY rn
 MEASURES ename_list AS ename_list,
          new_line AS new_line,
          Classifier() AS cls,
          Match_Number() AS mtc
 PATTERN ( strt sm* )
 DEFINE
   sm AS sm.new_line = 0
 )
 ORDER BY 1

How it works

  • emps_ordered and rsf subqueries as before
  • MATCH_RECOGNIZE clause: defines matching row sets that end in the lines to print
  • MEASURES section: includes two built-in functions for illustration purposes only: Classifier() = grouping, and Match_Number() = match number of the record
  • DEFINE section: defines a grouping, sm, based on the previously set flag, new_line, that applies if the flag = 0
  • PATTERN section: ( strt sm* ) includes an undefined grouping strt, and means to match an ordered set of records beginning with a record that does not fall into any defined grouping and continuing with zero or more records (but as many as possible) that are in the sm grouping

The output, with the extra built-in fields, helps to show how it works:

DEPT ENAME_LIST                                                                       CLS   MTC
---- -------------------------------------------------------------------------------- ---- ----
  10 Whalen, Jennifer                                                                 STRT    1
  20 Fay, Pat; Hartstein, Michael                                                     SM      1
  30 Baida, Shelli; Colmenares, Karen; Himuro, Guy; Khoo, Alexander; Raphaely, Den    SM      1
  30 Tobias, Sigal                                                                    STRT    2
  40 Mavris, Susan                                                                    STRT    1
  50 Atkinson, Mozhe; Bell, Sarah; Bissot, Laura; Bull, Alexis; Cabrio, Anthony       SM      1
  50 Chung, Kelly; Davies, Curtis; Dellinger, Julia; Dilly, Jennifer                  SM      2
  50 Everett, Britney; Feeney, Kevin; Fleaur, Jean; Fripp, Adam; Gates, Timothy       SM      3
  50 Gee, Ki; Geoni, Girard; Grant, Douglas; Jones, Vance; Kaufling, Payam            SM      4
  50 Ladwig, Renske; Landry, James; Mallin, Jason; Markle, Steven; Marlow, James      SM      5
  50 Matos, Randall; McCain, Samuel; Mikkilineni, Irene; Mourgos, Kevin; Nayer, Julia SM      6
  50 OConnell, Donald; Olson, TJ; Patel, Joshua; Perkins, Randall; Philtanker, Hazel  SM      7
  50 Rajs, Trenna; Rogers, Michael; Sarchand, Nandita; Seo, John; Stiles, Stephen     SM      8
  50 Sullivan, Martha; Taylor, Winston; Vargas, Peter; Vollman, Shanta; Walsh, Alana  SM      9
  50 Weiss, Matthew                                                                   STRT   10
  60 Austin, David; Ernst, Bruce; Hunold, Alexander; Lorentz, Diana; Pataballa, Valli SM      1
  70 Baer, Hermann                                                                    STRT    1
  80 Abel, Ellen; Ande, Sundar; Banda, Amit; Bates, Elizabeth; Bernstein, David       SM      1
  80 Bloom, Harrison; Cambrault, Gerald; Cambrault, Nanette; Doran, Louise            SM      2
  80 Errazuriz, Alberto; Fox, Tayler; Greene, Danielle; Hall, Peter; Hutton, Alyssa   SM      3
  80 Johnson, Charles; King, Janette; Kumar, Sundita; Lee, David; Livingston, Jack    SM      4
  80 Marvins, Mattea; McEwen, Allan; Olsen, Christopher; Ozer, Lisa; Partners, Karen  SM      5
  80 Russell, John; Sewall, Sarath; Smith, Lindsey; Smith, William; Sully, Patrick    SM      6
  80 Taylor, Jonathon; Tucker, Peter; Tuvault, Oliver; Vishney, Clara; Zlotkey, Eleni SM      7
  90 De Haan, Lex; King, Steven; Kochhar, Neena                                       SM      1
 100 Chen, John; Faviet, Daniel; Greenberg, Nancy; Popp, Luis; Sciarra, Ismael        SM      1
 100 Urman, Jose Manuel                                                               STRT    2
 110 Gietz, William; Higgins, Shelley                                                 SM      1
     Grant, Kimberely                                                                 STRT    1

29 rows selected.

Execution Plan

--------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                       | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                                |           |      1 |        |     29 |00:00:00.02 |     322 |       |       |          |
|   1 |  VIEW                                           |           |      1 |    117 |     29 |00:00:00.02 |     322 |       |       |          |
|   2 |   MATCH RECOGNIZE SORT DETERMINISTIC FINITE AUTO|           |      1 |    117 |     29 |00:00:00.02 |     322 | 13312 | 13312 |12288  (0)|
|   3 |    VIEW                                         |           |      1 |    117 |    107 |00:00:00.01 |     322 |       |       |          |
|   4 |     UNION ALL (RECURSIVE WITH) BREADTH FIRST    |           |      1 |        |    107 |00:00:00.01 |     322 |  2048 |  2048 | 2048  (0)|
|*  5 |      VIEW                                       |           |      1 |    107 |     12 |00:00:00.01 |       7 |       |       |          |
|*  6 |       WINDOW SORT PUSHED RANK                   |           |      1 |    107 |     12 |00:00:00.01 |       7 | 11264 | 11264 |10240  (0)|
|   7 |        TABLE ACCESS FULL                        | EMPLOYEES |      1 |    107 |    107 |00:00:00.01 |       7 |       |       |          |
|*  8 |      HASH JOIN                                  |           |     45 |     10 |     95 |00:00:00.02 |     315 |  1301K|  1301K| 1464K (0)|
|   9 |       VIEW                                      |           |     45 |    107 |   4815 |00:00:00.01 |     315 |       |       |          |
|  10 |        WINDOW SORT                              |           |     45 |    107 |   4815 |00:00:00.01 |     315 | 20480 | 20480 |18432  (0)|
|  11 |         TABLE ACCESS FULL                       | EMPLOYEES |     45 |    107 |   4815 |00:00:00.01 |     315 |       |       |          |
|  12 |       RECURSIVE WITH PUMP                       |           |     45 |        |    107 |00:00:00.01 |       0 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - filter("RN"=1)
   6 - filter(ROW_NUMBER() OVER ( PARTITION BY "DEPARTMENT_ID" ORDER BY "LAST_NAME"||', '||"FIRST_NAME")<=1)
   8 - access("E"."DEPT"="R"."DEPT" AND "E"."RN"="R"."RN"+1)

As in the earlier query, the employees table is accessed 46 times, indicating an obvious performance issue.

MODEL Solution

Oracle introduced the MODEL clause to its SQL syntax in v10. It has a reputation for often leading to SQL that is difficult to understand and sometimes inefficient, but it is well suited to this problem.

The query is:

WITH mod AS (
SELECT *
  FROM hr.employees
MODEL
  PARTITION BY (department_id)
  DIMENSION BY (Row_Number () OVER (PARTITION BY department_id ORDER BY last_name, first_name) rn)
  MEASURES (
    last_name || ', ' || first_name ename,
    CAST (NULL AS VARCHAR2(4000)) ename_list,
    0 line_print
  )
  RULES (
    ename_list[ANY] = CASE WHEN ename_list[CV()-1] IS NULL OR Length (ename_list[CV()-1] ||  '; ' || ename[CV()]) > :rec_len THEN ename[CV()]
                           ELSE ename_list[CV()-1] ||  '; ' || ename[CV()]
                      END,
    line_print[ANY] = CASE WHEN ename[CV()+1] IS NULL OR Length (ename_list[CV()] ||  '; ' || ename[CV()+1]) > :rec_len THEN 1 END
  )
)
SELECT department_id dept, ename_list
  FROM mod
 ORDER BY department_id, ename_list

How it works

  • mod subquery: the aggregation and line pirint flags are calculated in a single subquery using the MODEL clause
  • RULES section: first rule accumulates the aggregated lines, resetting when overspill occurs, with similar logic to that in the recursive subquery solution; the rule relies on the calculation occurring in the default order, by ascending dimension value; second rule relies on all the aggregates being calculated first by the first rule, then looks one row ahead within the partition to set the print flag
  • Main query: Selects rows where the print flag = 1

Execution Plan

------------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |           |      1 |        |     29 |00:00:00.01 |       7 |       |       |          |
|   1 |  SORT ORDER BY        |           |      1 |    107 |     29 |00:00:00.01 |       7 |  9216 |  9216 | 8192  (0)|
|*  2 |   VIEW                |           |      1 |    107 |     29 |00:00:00.01 |       7 |       |       |          |
|   3 |    SQL MODEL ORDERED  |           |      1 |    107 |    107 |00:00:00.01 |       7 |   962K|   905K| 1165K (0)|
|   4 |     WINDOW SORT       |           |      1 |    107 |    107 |00:00:00.01 |       7 | 18432 | 18432 |16384  (0)|
|   5 |      TABLE ACCESS FULL| EMPLOYEES |      1 |    107 |    107 |00:00:00.01 |       7 |       |       |          |
------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("LINE_PRINT"=1)

Note that the employees table is accessed only once, suggesting that this may be a mmore efficient query than the recursive subquery solutions.

Conclusions

  • We have presented three SQL solutions for length-controlled list aggregation
  • Two are based around the v11.2 feature recursive subquery factoring, with one also using the v12.1 feature, match_recognise
  • The third solution uses the v10.1 model clause feature, and this appears to be the simplest and fastest of the three, although no volume testing has been performed






Master-Detail Transaction Reconciliation in SQL (MDTM3)

This is the final article in a sequence of three on the subject of master-detail transaction matching. In the first article, Master-Detail Transaction Matching in SQL (MDTM1), I described the problem and divided it into two subproblems, the first being to identify all pairs of matching transactions and the second being to reconcile the pairs so that one transaction matches against at most one other transaction. The underlying motivation here comes from the problem of reconciling intra-company credit and debit transactions where fields may need to match directly, or may need to match after some mapping function is applied, including inversion (contra-matching). We have taken a simple prototype problem defined on Oracle’s system tables where only matching conditions are specified. It should be straightforward to extend the techniques demonstrated to more general matching conditions (I have done so myself on the real business problem that prompted this analysis).

The first article developed a series of queries to solve the first subproblem using the idea of pre-aggregation of the detail records as a key performance-enhancing feature. This resulted in a best query that used a temporary table and achieved a time variation that was quadratic in the number of master transactions (we kept the numbers of details per master fixed).

The second article, Holographic Set Matching in SQL (MDTM2), took the aggregation a step further, using list aggregation to bypass direct detail set matching altogether, and this enabled linear time variation.

In this third article, we take the last, linear-time query and extend it to solve the second subproblem, providing a solution to the overall problem in a single query that shows the same linear-time variation property in our test results. The sample problem will be the same as in the previous article.

Output Specification
The output will be ordered first by section, then by group, then by transaction unique identifier, with paired records appearing together using the first transaction for ordering within the group. The sections are defined thus:

  • Reconciled pairs – two records for each matching pair, with no transaction appearing in more than one pair
  • Matched but unreconciled transactions – transactions that match others but could not be reconciled because their matching partners are all paired off against other transactions
  • Unmatched transactions – transactions not matching any other transaction

Queries
We’ll include the best query from the last article (L2_SQF), as well as the new query (RECON) that extends it to solve the overall problem.

  • L2_SQF – solves first subproblem by list aggregation without direct detil matching
  • RECON – extends L2_SQF to solve the overall problem using a sequence of query subfactors

The first query will not be described below, as it appeared in the previous article but will be included in the results section for comparison purposes.

Query Structure Diagram (QSD)

Query Text

WITH rns AS (
SELECT r_owner,
       r_constraint_name,
       Row_Number () OVER (ORDER BY r_owner, r_constraint_name) - 1 rn
  FROM con_cp
 WHERE constraint_type	    = 'R'
 GROUP BY 
       r_owner,
       r_constraint_name
), rch AS ( 
SELECT r_owner,
       r_constraint_name,
       Chr (Floor (rn / 128)) ||
       Chr ((rn - 128 * Floor (rn / 128))) chr_rank
  FROM rns
), tab AS (
SELECT t.owner,
       t.table_name,
       t.ROWID                   row_id,
       Count(c.ROWID)            n_det,
       Listagg (r.chr_rank, '') WITHIN GROUP (ORDER BY r.chr_rank) lagg
  FROM tab_cp                    t
  JOIN con_cp                    c
    ON c.owner                   = t.owner
   AND c.table_name              = t.table_name
   AND c.constraint_type         = 'R'
  JOIN rch                       r
    ON r.r_owner                 = c.r_owner
   AND r.r_constraint_name       = c.r_constraint_name
 GROUP BY
        t.owner,
        t.table_name,
        t.ROWID
), dup as (
SELECT t1.owner                  owner_1,
       t1.table_name             table_name_1,
       t2.owner                  owner_2,
       t2.table_name             table_name_2,
       t1.n_det                  n_det
  FROM tab                       t1
  JOIN tab                       t2
    ON t2.lagg                   = t1.lagg
   AND t2.row_id                 > t1.row_id
), btw AS (
SELECT owner_1       owner_1_0,
       table_name_1  table_name_1_0,
       owner_1,
       table_name_1,
       owner_2,
       table_name_2,
       n_det
  FROM dup
 UNION
SELECT owner_1,
       table_name_1,
       owner_2,
       table_name_2,
       owner_1,
       table_name_1,
       n_det
  FROM dup
), grp AS (
SELECT owner_1_0,
       table_name_1_0,
       owner_1,
       table_name_1,
       owner_2,
       table_name_2,
       n_det,
       Least (owner_1 || '/' || table_name_1, Min (owner_2 || '/' || table_name_2)
         OVER (PARTITION BY owner_1, table_name_1)) grp_uid
  FROM btw
), rnk AS (
SELECT owner_1_0,
       table_name_1_0,
       owner_1,
       table_name_1,
       Dense_Rank () OVER  (PARTITION BY grp_uid ORDER BY owner_1, table_name_1) r1,
       owner_2,
       table_name_2,
       Dense_Rank () OVER  (PARTITION BY grp_uid ORDER BY owner_2, table_name_2) r2,
       n_det,
       grp_uid
  FROM grp
), rec AS (
SELECT owner_1_0,
       table_name_1_0,
       owner_1,
       table_name_1,
       owner_2,
       table_name_2,
       n_det,
       grp_uid
  FROM rnk
 WHERE (r2 = r1 + 1 AND Mod (r1, 2) = 1) OR (r1 = r2 + 1 AND Mod (r2, 2) = 1)
), rcu AS (
SELECT owner_1_0,
       table_name_1_0,
       owner_1,
       table_name_1,
       owner_2,
       table_name_2,
       n_det,
       grp_uid
  FROM rec
 UNION  
SELECT owner_1,
       table_name_1,
       owner_1,
       table_name_1,
       NULL,
       NULL,
       n_det,
       grp_uid
  FROM grp
 WHERE (owner_1, table_name_1) NOT IN (SELECT owner_1, table_name_1 FROM rec)
 UNION  
SELECT owner,
       table_name,
       owner,
       table_name,
       NULL,
       NULL,
       n_det,
       NULL
  FROM tab
 WHERE (owner, table_name) NOT IN (SELECT owner_1, table_name_1 FROM btw)
)
SELECT
       owner_1,
       table_name_1,
       owner_2,
       table_name_2,
       n_det,
       grp_uid
  FROM rcu
 ORDER BY grp_uid,
       CASE WHEN grp_uid IS NULL THEN 3 
               WHEN owner_2 IS NULL THEN 2
               ELSE 1
          END,
       owner_1_0,
       table_name_1_0,
       owner_1,
       table_name_1

How it Works
The query proceeds by ten stages using query subfactors. The first three stages correspond to query L2_SQF which then has a main query, whereas we now have another six stages before the main query, as shown:

  1. Rank the distinct details [Group by matching fields, then use Row_Number to rank by same]
  2. Convert ranks to base 128 [Use Floor() and Chr() functions; uses 2 characters here]
  3. Aggregate detail ranks for master [Use Listagg on the rank fixed-length string]
  4. Get all matching pairs one-way [Self-join on matching aggregate and second rowid greater]
  5. Union in the reverse pairs, with sorting column [Add in records with uids swapped, but keep uid 1 separately for sorting]
  6. Assign grouping field to each pair [Take minimum of uid 2 over uid 1, or uid 1 if smaller]
  7. Rank each side of pair within its group [Use Dense_Rank() over grouping, ordering by uids]
  8. Retain odd-even sequentially ranked pairs [Retain pairs with ranks (1,2) or (3,4) etc. and the reverses]
  9. Add in unmatched and matched but unreconciled [3-way union: first the reconciled; then the matched but unreconciled; then unmatched]
  10. Sort by the source uid 1, then the current uid 1 [Sort key ensures that reconciled pairs stay together within their matching group]

Notes:

  • In our matching-only sample problem, matching transactions form mutually matching sets, whereas for contra-matching, there are pairs of contra-matching sets as discussed in the first article. The grouping subqueries will therefore differ in the latter case, and for example, pairing could be by matching numerical rank within the respective sets
  • The final subquery factor could be incorporated in the main query, but I retain it because the benchmarking framework does not support unions in the main query, and CBO optimises it away in any case

Results
Both queries were run on the same sample problem as in the previous article. The output comparison gives the output listings for width parameter of 1, which corresponds to the tables and constraints on my v11.2 system copied to the test tables with owner prefix ‘_0’ added. The timings and statistics are for widths from 1 to 6.

Output Comparison

The output file has tabs for the output from both queries, and a tab with examples from each of the three sections for the second. Notice that OEHR_EMPLOYEES and OEHR_JOB_HISTORY form a reconciled pair, with three detail records, while EMPLOYEES and JOB_HISTORY are unmatched, with four detail records. This is because, as I mentioned in the first article, I have added an extra record to each of the latter tables’ detail tables (i.e. foreign key constraints), the extra record being a duplicate of one of the other three (in terms of the matching fields), but a different duplicate in each case. This tests correct handling of duplicates within the detail sets.

Performance Comparison
Click on the query name in the file below to jump to the execution plan for the largest data point, and click the tabs to see different views on the performance obtained.

  • The timings in the file above are roughly consistent with linear-time variation with problem size; if anything L2_SQF appears sublinear, but the times are fairly small and there was some other activity on the PC at the time
  • At the largest data point, RECON takes 5 times as much CPU time as L2_SQF, and 9 times as much elapsed time
  • The differences between elapsed and CPU times for RECON are indicative of significant file I/O activity. This shows up in the disk reads and writes summaries on the statistics tab, and in more detail in the Plans tab, and is caused mainly by reading and writing of the subquery factors to and from disk
  • The other main significant factor in the increase in times for the RECON query is the additional sorting; see, for example, steps 31 and 32 in the plan. These effects are the result of the additional subquery factors that were needed to achieve the final result

Conclusions

  • This three-part sequence of articles has focussed on a special category of problem within SQL, but has highlighted a range of SQL techniques that are useful generally, including:
    • Subquery factors
    • Temporary tables
    • Analytic functions
    • Set matching by list aggregation
    • Compact storage of unique identifiers by ranking and base-conversion via the Chr() function
  • We have also noted different ways of matching sets, including use of the MINUS set operator and the NOT EXISTS construct, and discussed ways of handling issues such as duplication within a set, and directionality of the set operators
  • The importance of polynomial order of solution performance for efficiency has been illustrated dramatically
  • The final SQL solution provides a good illustration of the power of modern SQL to solve complex problems using set-based logic combined with sequence in a simpler and faster way than the more conventional procedural approach
  • The subquery-sequence approach to SQL is well suited to diagrammatic design techniques
  • It is often better to solve complex real-world problems by first working with simpler prototypes






Holographic Set Matching in SQL (MDTM2)

This article is the second in a sequence of three dealing with a very general class of problems in SQL, and exploring various techniques to find efficient solutions. In the first article, Master-Detail Transaction Matching in SQL (MDTM1), the problem was outlined and divided into two subproblems, of which the first was solved in several variant SQL statements with performance analysis. This second article, takes the most efficient method and applies two new techniques to further improve performance. The third article, Master-Detail Transaction Reconciliation in SQL (MDTM3), adds a sequence of subquery factors to the best solution for the first subproblem to achieve an efficient solution to the overall problem within a single SQL statement.

The holographic principle is a mathematical principle that the total information contained in a volume of space corresponds to an equal amount of information contained on the boundary of that space.Holographic Principle.

In my last article, I made large performance gains in SQL queries matching sets of detail records by obtaining aggregates of the sets in a subquery factor and matching those at master level before matching the detail sets directly. The performance gain came from the fact that the aggregation is cheap compared to matching sets of records and allows many matching pair candidates to be discarded before doing the expensive direct set matching. However, all of the actually matching transactions would have been directly matched and probably more, and it occurred to me to wonder whether it might not be possible to use aggregate matching to replace detail set matching altogether.

This article develops the previous one by taking the same sample transaction matching problem and adding queries that use the Oracle 11.2 function Listagg to allow just this replacement. This is possible so long as the list-aggregated detail matching fields do not exceed 4,000 characters. If that were to happen then some other aggregation technique would be needed, perhaps a user-defined CLOB version of Listagg. However, it’s possible to extend the range of applicability by aggregating identifiers smaller than the actual fields, as I’ll discuss at the end.

We’ll keep the fastest query from the previous article, and add three new queries:

Query Variations

  • MIN_NE – Detail grouping in temporary table, with set matching (GTT_NE previously)
  • LAG_SQF – Detail grouping by Listagg only in subquery factor
  • LAG_NE – Detail grouping by Listagg in subquery factor, with set matching
  • LAG_GTT – Detail grouping by Listagg only in temporary table
************
MIN_NE
************
  WITH tab AS (
SELECT t.owner,
       t.table_name,
       t.ROWID                   row_id,
       Sum (c.n_con)             n_det,
       Min (c.r_constraint_name) min_det,
       Max (c.r_constraint_name) max_det
  FROM tab_cp                    t
  JOIN grp_gtt                   c
    ON c.owner                   = t.owner
   AND c.table_name              = t.table_name
 GROUP BY
        t.owner,
        t.table_name,
        t.ROWID
)
SELECT
       t1.owner                  owner_1,
       t1.table_name             table_name_1,
       t2.owner                  owner_2,
       t2.table_name             table_name_2,
       t1.n_det                  n_det
  FROM tab                       t1
  JOIN tab                       t2
    ON t2.n_det                  = t1.n_det
   AND t2.min_det                = t1.min_det
   AND t2.max_det                = t1.max_det
   AND t2.row_id                 > t1.row_id
 WHERE NOT EXISTS (
SELECT 1
  FROM grp_gtt                   d1
 WHERE d1.owner                  = t1.owner
   AND d1.table_name             = t1.table_name
   AND (
   NOT EXISTS (
SELECT 1
  FROM grp_gtt                   d2
 WHERE d2.owner                  = t2.owner
   AND d2.table_name             = t2.table_name
   AND d2.r_owner                = d1.r_owner
   AND d2.r_constraint_name      = d1.r_constraint_name
   AND d2.n_con                  = d1.n_con
)))
 ORDER BY t1.owner,
       t1.table_name,
       t2.owner,
       t2.table_name

************
LAG_SQF
************
  WITH tab AS (
SELECT t.owner,
       t.table_name,
       t.ROWID                   row_id,
       Count(c.ROWID)            n_det,
       Listagg (c.r_owner||c.r_constraint_name, '') WITHIN GROUP (ORDER BY c.r_owner||c.r_constraint_name) lagg
  FROM tab_cp                    t
  JOIN con_cp                    c
    ON c.owner                   = t.owner
   AND c.table_name              = t.table_name
   AND c.constraint_type         = 'R'
 GROUP BY
        t.owner,
        t.table_name,
        t.ROWID
)
SELECT
       t1.owner                  owner_1,
       t1.table_name             table_name_1,
       t2.owner                  owner_2,
       t2.table_name             table_name_2,
       t1.n_det                  n_det
  FROM tab                       t1
  JOIN tab                       t2
    ON t2.n_det                  = t1.n_det
   AND t2.lagg                   = t1.lagg
   AND t2.row_id                 > t1.row_id
 ORDER BY t1.owner,
       t1.table_name,
       t2.owner,
       t2.table_name

************
LAG_NE
************
  WITH tab AS (
SELECT t.owner,
       t.table_name,
       t.ROWID                   row_id,
       Sum (c.n_con)             n_det,
       Listagg (c.r_owner||c.r_constraint_name, '') WITHIN GROUP (ORDER BY c.r_owner||c.r_constraint_name) lagg
  FROM tab_cp                    t
  JOIN grp_gtt                   c
    ON c.owner                   = t.owner
   AND c.table_name              = t.table_name
 GROUP BY
        t.owner,
        t.table_name,
        t.ROWID
)
SELECT
       t1.owner                  owner_1,
       t1.table_name             table_name_1,
       t2.owner                  owner_2,
       t2.table_name             table_name_2,
       t1.n_det                  n_det
  FROM tab                       t1
  JOIN tab                       t2
    ON t2.n_det                  = t1.n_det
   AND t2.lagg                   = t1.lagg
   AND t2.row_id                 > t1.row_id
 WHERE NOT EXISTS (
SELECT 1
  FROM grp_gtt                   d1
 WHERE d1.owner                  = t1.owner
   AND d1.table_name             = t1.table_name
   AND (
   NOT EXISTS (
SELECT 1
  FROM grp_gtt                   d2
 WHERE d2.owner                  = t2.owner
   AND d2.table_name             = t2.table_name
   AND d2.r_owner                = d1.r_owner
   AND d2.r_constraint_name      = d1.r_constraint_name
   AND d2.n_con                  = d1.n_con
)))
 ORDER BY t1.owner,
       t1.table_name,
       t2.owner,
       t2.table_name

************
LAG_GTT
************
SELECT
       t1.owner                  owner_1,
       t1.table_name             table_name_1,
       t2.owner                  owner_2,
       t2.table_name             table_name_2,
       t1.n_det                  n_det
  FROM tab_gtt                   t1
  JOIN tab_gtt                   t2
    ON t2.n_det                  = t1.n_det
   AND t2.lagg                   = t1.lagg
   AND t2.row_id                 > t1.row_id
 ORDER BY t1.owner,
       t1.table_name,
       t2.owner,
       t2.table_name

The two temporary tables are as follows, with * marking unique keys:
tab_gtt

  • owner*
  • table_name*
  • row_id
  • lagg
  • n_det

A unique index is defined on tab_gtt:
tab_gtt_uk

  • owner
  • table_name

A non-unique index is defined on tab_gtt:
tab_gtt_N1

  • lagg

grp_gtt

  • owner*
  • constraint_name*
  • r_owner*
  • r_constraint_name*
  • n_con

A unique index is defined on grp_gtt:
grp_gtt_uk

  • owner
  • constraint_name
  • r_owner
  • r_constraint_name

Performance Analysis

We presented four query variations above, and in this section give the results of benchmarking these queries across a 1-dimensional data domain obtained by copying the system views once, then by successive powers of two up to 32 times into my test tables described in the previous article. The problem sizes are as follows:
Record Counts

Timings and Statistics
Click on the query name in the file below to jump to the execution plan for the largest data point.

Comparison

The figures above are for the largest data pont, W32. The following points can be made:

  • The query that performed best in the earlier article is now worst compared with the new queries
  • The best query uses Listagg within a subquery factor and is more than 200 times faster than the worst at the largest data point
  • Moving the new listagg-based subquery factor into a temporary table worsens performance because the index is not used
  • The statistics tab shows that performance variation is now linear with problem size for the new Listagg queries, which is why they inevitably outperform the old one at large enough problem size

Subquery Factors and Temporary Tables
Replacing a subquery factor by a temporary table can only help if indexed accesses are beneficial.

Execution Plan Hash Values
In my last article, I noted that the plan hash values differed for all queries between data points, although the plans were essentially the same, and surmised that this was due to the subquery factor internal naming system. LAG_GTT is the only query here that makes no use of subquery factors, and this is the only one that retains the same plan hash value, thus bearing out the surmise.

Extending Listagg Applicability

If there are a large number of matching fields then the Listagg limit of 4,000 characters could be hit in quite a small number of details for a master. It’s not difficult to write a CLOB version of Listagg, but one way of mitigating the restriction would be to aggregate not the actual matching fields, but the ranking of the set within all distinct detail sets. A further reduction in the size of the aggregated values can be obtained by storing the ranking in a high number-base, rather than base 10, as a zero-left-padded string. If the database character set is UTF8 (as is my 11.2 XE database), base 128 is possible, while extended Ascii character sets should allow base 256. The number of characters assigned to the ranking value determines how many distinct sets and how many detail records per master record are allowed with the standard Listagg function, according to the table below (for UTF8):

Chars  Distinct Sets  Details/Master
=====  =============  ==============
    1            128           4,000
    2         16,384           2,000
    3      2,097,152           1,333
    4    268,435,456           1,000

New Query L2_SQF

WITH rns AS (
SELECT r_owner,
       r_constraint_name,
       Row_Number () OVER (ORDER BY r_owner, r_constraint_name) - 1 rn
  FROM con_cp
 WHERE constraint_type	    = 'R'
 GROUP BY 
       r_owner,
       r_constraint_name
), rch AS ( 
SELECT r_owner,
       r_constraint_name,
       Chr (Floor (rn / 128)) ||
       Chr ((rn - 128 * Floor (rn / 128))) chr_rank
  FROM rns
), tab AS (
SELECT t.owner,
       t.table_name,
       t.ROWID                   row_id,
       Count(c.ROWID)            n_det,
       Listagg (r.chr_rank, '') WITHIN GROUP (ORDER BY r.chr_rank) lagg
  FROM tab_cp                    t
  JOIN con_cp                    c
    ON c.owner                   = t.owner
   AND c.table_name              = t.table_name
   AND c.constraint_type         = 'R'
  JOIN rch                       r
    ON r.r_owner                 = c.r_owner
   AND r.r_constraint_name       = c.r_constraint_name
 GROUP BY
        t.owner,
        t.table_name,
        t.ROWID
)
SELECT t1.owner                  owner_1,
       t1.table_name             table_name_1,
       t2.owner                  owner_2,
       t2.table_name             table_name_2,
       t1.n_det                  n_det
  FROM tab                       t1
  JOIN tab                       t2
    ON t2.lagg                   = t1.lagg
   AND t2.row_id                 > t1.row_id
 ORDER BY t1.owner,
       t1.table_name,
       t2.owner,
       t2.table_name

QSDs

Record Counts
The same set of data points has been used for the new query (L2_SQF) with the best of the earlier ones for comparison (LAG_SQF). The record counts have slightly increased. Note that the constraints per table is for all constraints, not just foreign keys.

Timings and Statistics
Click on the query name in the file below to jump to the execution plan for the largest data point.

Comparison

The figures above are for the largest data pont, W32. The following points can be made:

  • The new query is about 20% slower than the old one
  • Performance variation remains linear with problem size for the new query
  • The test problem has very few details per master, but the relative performances may change for real problems

Conclusions

This article has used a new idea that I termed holographic set matching to improve performance relative to the queries in my previous article on Master-Detail Transaction Matching in SQL (MDTM1), achieving linear time variation with size, compared with the earlier quadratic time. Although the new Oracle 11.2 function Listagg has been used, the method can by applied in earlier versions by adding a user-defined list aggregation function, which is easy to do. The third article in this sequence, Master-Detail Transaction Reconciliation in SQL (MDTM3), solves the overall problem described in the first article by adding a sequence of subquery factors to the query developed above.