This article is the second in a sequence of three dealing with a very general class of problems in SQL, and exploring various techniques to find efficient solutions. In the first article, Master-Detail Transaction Matching in SQL (MDTM1), the problem was outlined and divided into two subproblems, of which the first was solved in several variant SQL statements with performance analysis. This second article, takes the most efficient method and applies two new techniques to further improve performance. The third article, Master-Detail Transaction Reconciliation in SQL (MDTM3), adds a sequence of subquery factors to the best solution for the first subproblem to achieve an efficient solution to the overall problem within a single SQL statement.
The holographic principle is a mathematical principle that the total information contained in a volume of space corresponds to an equal amount of information contained on the boundary of that space. – Holographic Principle.
In my last article, I made large performance gains in SQL queries matching sets of detail records by obtaining aggregates of the sets in a subquery factor and matching those at master level before matching the detail sets directly. The performance gain came from the fact that the aggregation is cheap compared to matching sets of records and allows many matching pair candidates to be discarded before doing the expensive direct set matching. However, all of the actually matching transactions would have been directly matched and probably more, and it occurred to me to wonder whether it might not be possible to use aggregate matching to replace detail set matching altogether.
This article develops the previous one by taking the same sample transaction matching problem and adding queries that use the Oracle 11.2 function Listagg to allow just this replacement. This is possible so long as the list-aggregated detail matching fields do not exceed 4,000 characters. If that were to happen then some other aggregation technique would be needed, perhaps a user-defined CLOB version of Listagg. However, it’s possible to extend the range of applicability by aggregating identifiers smaller than the actual fields, as I’ll discuss at the end.
We’ll keep the fastest query from the previous article, and add three new queries:
Query Variations
- MIN_NE – Detail grouping in temporary table, with set matching (GTT_NE previously)
- LAG_SQF – Detail grouping by Listagg only in subquery factor
- LAG_NE – Detail grouping by Listagg in subquery factor, with set matching
- LAG_GTT – Detail grouping by Listagg only in temporary table
************ MIN_NE ************ WITH tab AS ( SELECT t.owner, t.table_name, t.ROWID row_id, Sum (c.n_con) n_det, Min (c.r_constraint_name) min_det, Max (c.r_constraint_name) max_det FROM tab_cp t JOIN grp_gtt c ON c.owner = t.owner AND c.table_name = t.table_name GROUP BY t.owner, t.table_name, t.ROWID ) SELECT t1.owner owner_1, t1.table_name table_name_1, t2.owner owner_2, t2.table_name table_name_2, t1.n_det n_det FROM tab t1 JOIN tab t2 ON t2.n_det = t1.n_det AND t2.min_det = t1.min_det AND t2.max_det = t1.max_det AND t2.row_id > t1.row_id WHERE NOT EXISTS ( SELECT 1 FROM grp_gtt d1 WHERE d1.owner = t1.owner AND d1.table_name = t1.table_name AND ( NOT EXISTS ( SELECT 1 FROM grp_gtt d2 WHERE d2.owner = t2.owner AND d2.table_name = t2.table_name AND d2.r_owner = d1.r_owner AND d2.r_constraint_name = d1.r_constraint_name AND d2.n_con = d1.n_con ))) ORDER BY t1.owner, t1.table_name, t2.owner, t2.table_name ************ LAG_SQF ************ WITH tab AS ( SELECT t.owner, t.table_name, t.ROWID row_id, Count(c.ROWID) n_det, Listagg (c.r_owner||c.r_constraint_name, '') WITHIN GROUP (ORDER BY c.r_owner||c.r_constraint_name) lagg FROM tab_cp t JOIN con_cp c ON c.owner = t.owner AND c.table_name = t.table_name AND c.constraint_type = 'R' GROUP BY t.owner, t.table_name, t.ROWID ) SELECT t1.owner owner_1, t1.table_name table_name_1, t2.owner owner_2, t2.table_name table_name_2, t1.n_det n_det FROM tab t1 JOIN tab t2 ON t2.n_det = t1.n_det AND t2.lagg = t1.lagg AND t2.row_id > t1.row_id ORDER BY t1.owner, t1.table_name, t2.owner, t2.table_name ************ LAG_NE ************ WITH tab AS ( SELECT t.owner, t.table_name, t.ROWID row_id, Sum (c.n_con) n_det, Listagg (c.r_owner||c.r_constraint_name, '') WITHIN GROUP (ORDER BY c.r_owner||c.r_constraint_name) lagg FROM tab_cp t JOIN grp_gtt c ON c.owner = t.owner AND c.table_name = t.table_name GROUP BY t.owner, t.table_name, t.ROWID ) SELECT t1.owner owner_1, t1.table_name table_name_1, t2.owner owner_2, t2.table_name table_name_2, t1.n_det n_det FROM tab t1 JOIN tab t2 ON t2.n_det = t1.n_det AND t2.lagg = t1.lagg AND t2.row_id > t1.row_id WHERE NOT EXISTS ( SELECT 1 FROM grp_gtt d1 WHERE d1.owner = t1.owner AND d1.table_name = t1.table_name AND ( NOT EXISTS ( SELECT 1 FROM grp_gtt d2 WHERE d2.owner = t2.owner AND d2.table_name = t2.table_name AND d2.r_owner = d1.r_owner AND d2.r_constraint_name = d1.r_constraint_name AND d2.n_con = d1.n_con ))) ORDER BY t1.owner, t1.table_name, t2.owner, t2.table_name ************ LAG_GTT ************ SELECT t1.owner owner_1, t1.table_name table_name_1, t2.owner owner_2, t2.table_name table_name_2, t1.n_det n_det FROM tab_gtt t1 JOIN tab_gtt t2 ON t2.n_det = t1.n_det AND t2.lagg = t1.lagg AND t2.row_id > t1.row_id ORDER BY t1.owner, t1.table_name, t2.owner, t2.table_name
The two temporary tables are as follows, with * marking unique keys:
tab_gtt
- owner*
- table_name*
- row_id
- lagg
- n_det
A unique index is defined on tab_gtt:
tab_gtt_uk
- owner
- table_name
A non-unique index is defined on tab_gtt:
tab_gtt_N1
- lagg
grp_gtt
- owner*
- constraint_name*
- r_owner*
- r_constraint_name*
- n_con
A unique index is defined on grp_gtt:
grp_gtt_uk
- owner
- constraint_name
- r_owner
- r_constraint_name
Performance Analysis
We presented four query variations above, and in this section give the results of benchmarking these queries across a 1-dimensional data domain obtained by copying the system views once, then by successive powers of two up to 32 times into my test tables described in the previous article. The problem sizes are as follows:
Record Counts
Timings and Statistics
Click on the query name in the file below to jump to the execution plan for the largest data point.
Comparison
The figures above are for the largest data pont, W32. The following points can be made:
- The query that performed best in the earlier article is now worst compared with the new queries
- The best query uses Listagg within a subquery factor and is more than 200 times faster than the worst at the largest data point
- Moving the new listagg-based subquery factor into a temporary table worsens performance because the index is not used
- The statistics tab shows that performance variation is now linear with problem size for the new Listagg queries, which is why they inevitably outperform the old one at large enough problem size
Subquery Factors and Temporary Tables
Replacing a subquery factor by a temporary table can only help if indexed accesses are beneficial.
Execution Plan Hash Values
In my last article, I noted that the plan hash values differed for all queries between data points, although the plans were essentially the same, and surmised that this was due to the subquery factor internal naming system. LAG_GTT is the only query here that makes no use of subquery factors, and this is the only one that retains the same plan hash value, thus bearing out the surmise.
Extending Listagg Applicability
If there are a large number of matching fields then the Listagg limit of 4,000 characters could be hit in quite a small number of details for a master. It’s not difficult to write a CLOB version of Listagg, but one way of mitigating the restriction would be to aggregate not the actual matching fields, but the ranking of the set within all distinct detail sets. A further reduction in the size of the aggregated values can be obtained by storing the ranking in a high number-base, rather than base 10, as a zero-left-padded string. If the database character set is UTF8 (as is my 11.2 XE database), base 128 is possible, while extended Ascii character sets should allow base 256. The number of characters assigned to the ranking value determines how many distinct sets and how many detail records per master record are allowed with the standard Listagg function, according to the table below (for UTF8):
Chars Distinct Sets Details/Master ===== ============= ============== 1 128 4,000 2 16,384 2,000 3 2,097,152 1,333 4 268,435,456 1,000
New Query L2_SQF
WITH rns AS ( SELECT r_owner, r_constraint_name, Row_Number () OVER (ORDER BY r_owner, r_constraint_name) - 1 rn FROM con_cp WHERE constraint_type = 'R' GROUP BY r_owner, r_constraint_name ), rch AS ( SELECT r_owner, r_constraint_name, Chr (Floor (rn / 128)) || Chr ((rn - 128 * Floor (rn / 128))) chr_rank FROM rns ), tab AS ( SELECT t.owner, t.table_name, t.ROWID row_id, Count(c.ROWID) n_det, Listagg (r.chr_rank, '') WITHIN GROUP (ORDER BY r.chr_rank) lagg FROM tab_cp t JOIN con_cp c ON c.owner = t.owner AND c.table_name = t.table_name AND c.constraint_type = 'R' JOIN rch r ON r.r_owner = c.r_owner AND r.r_constraint_name = c.r_constraint_name GROUP BY t.owner, t.table_name, t.ROWID ) SELECT t1.owner owner_1, t1.table_name table_name_1, t2.owner owner_2, t2.table_name table_name_2, t1.n_det n_det FROM tab t1 JOIN tab t2 ON t2.lagg = t1.lagg AND t2.row_id > t1.row_id ORDER BY t1.owner, t1.table_name, t2.owner, t2.table_name
QSDs
Record Counts
The same set of data points has been used for the new query (L2_SQF) with the best of the earlier ones for comparison (LAG_SQF). The record counts have slightly increased. Note that the constraints per table is for all constraints, not just foreign keys.
Timings and Statistics
Click on the query name in the file below to jump to the execution plan for the largest data point.
Comparison
The figures above are for the largest data pont, W32. The following points can be made:
- The new query is about 20% slower than the old one
- Performance variation remains linear with problem size for the new query
- The test problem has very few details per master, but the relative performances may change for real problems
Conclusions
This article has used a new idea that I termed holographic set matching to improve performance relative to the queries in my previous article on Master-Detail Transaction Matching in SQL (MDTM1), achieving linear time variation with size, compared with the earlier quadratic time. Although the new Oracle 11.2 function Listagg has been used, the method can by applied in earlier versions by adding a user-defined list aggregation function, which is easy to do. The third article in this sequence, Master-Detail Transaction Reconciliation in SQL (MDTM3), solves the overall problem described in the first article by adding a sequence of subquery factors to the query developed above.
about to decorate any room
From their second birthday to any room
From their second birthday to answer all your chosen store addresses and phrase balloons to take your order confirmation email – along with our giant number 2 balloon may conduct electricity Do not release the most stylish colour scheme around for free of our store for free
Colour: Rose Gold
Free Foil Helium Balloon Inflation In-Store
If you’ve bought a chic rose gold hues are sure to add elegance link here show this (see our store first to building restrictions we cannot provide a milestone and create a sealed packet so you