Last bank holiday Monday I posted a solution to an SQL problem on OTN, and I later thought that the SQL would make a nice example to illustrate my Query Structure Diagramming (QSD) technique. I published my first example of this in May 2009 on scribd,
Loading...
SQL Subquery Factors
Subquery factors were introduced in Oracle Database v9.2 and have since become a key technique in developing queries of any complexity. They generalise the v8 inline view technique, allowing subqueries now to be declared with an alias (using the ‘WITH’ clause), then referenced as often as desired later in the query. When referenced multiple times, Oracle’s Cost Based Optimiser (CBO) normally executes the subquery once and writes the results to temporary space to aid performance (this can be seen in the Explain Plan as a LOAD AS SELECT action). Using subquery factors can make queries much easier to read, even when they are referenced only once, in which case CBO normally restructures them internally to incorporate them within another subquery or the main query. It is important to note, though, that subquery factors, when retained by CBO, will be joined by full scans when referenced later in the query and in some cases it is more efficient to retain the table references to allow indexed joins (my next blog post will include an example of this).
Subquery factors, along with inline views, are the building blocks of modern SQL, as subroutines are of other languages. My QSDs are intended to show how they allow a procedural flow at the structural level, while retaining the set-based logic within the subqueries.
Leave and Attendance Query
The problem here is that the poster has a daily attendance table and a leave table, where leave is stored as date ranges, but wants a single query that outputs data in daily form. The keys to this are:
- Realising that you cannot drive from the event tables but must generate a continuous set of days to drive from, for each employee (assuming you don’t have them in a separate reference table)
- Converting the leave ranges to leave days by joining to the generated days rowset
- Joining the leave daily rowset with the attendance table by a union
Read more here (I’m BrendanP on OTN): Attendance and Leave table Join
ERD
Note that tables were not provided for Employee and Day in the OTN post, but it is useful to include them as entities nonetheless.
SQL
Note that in the query below, both the date range and the employee set are generated from the transactional data, which is obviously an artificial feature arising from this being for a problem on a forum, but it’s no harm in terms of the purpose of this article.
WITH ext AS ( SELECT Min (att_date) min_date, Max (att_date) - Min (att_date) + 1 n_days FROM attendance ), dys AS ( SELECT min_date + LEVEL - 1 day FROM ext CONNECT BY LEVEL < n_days + 1 ), ems AS ( SELECT emp_id FROM attendance UNION SELECT employee_number FROM leave ), edy AS ( SELECT dys.day, ems.emp_id FROM dys CROSS JOIN ems ), ldy AS ( SELECT edy.emp_id, edy.day, lve.leave_reason FROM edy JOIN leave lve ON edy.day BETWEEN lve.date_start AND lve.date_end AND lve.employee_number = edy.emp_id ), uni AS ( SELECT emp_id, att_date, timein, timeout, late_in, early_out, reason FROM attendance UNION SELECT emp_id, day, NULL, NULL, NULL, NULL, leave_reason FROM ldy ) SELECT edy.emp_id, edy.day, uni.timein, uni.timeout, uni.late_in, uni.early_out, uni.reason FROM edy LEFT JOIN uni ON uni.att_date = edy.day AND uni.emp_id = edy.emp_id ORDER BY 1, 2{code}
QSD
Counting Flight Statistics Query
The problem here is that the poster has three tables with data on events for frequent fliers and wants to show aggregate counts by year, but wants all years within a range to be included in the output, including years with no events. The key to this is realising that you cannot drive from the event tables but must generate a continuous set of years to drive from (assuming you don’t have them in a separate reference table). Read more here: Multiple Count aggregates from different sources grouped by Year
ERD
Note that a table was not provided for Year in the OTN post, but it is useful to include it as an entity nonetheless.
SQL
WITH yrs AS ( SELECT Add_Months (To_Date ('01011989', 'ddmmyyyy'), 12*LEVEL) YEAR FROM DUAL CONNECT BY LEVEL < 24 ), ffe AS ( SELECT Trunc (start_date, 'YEAR') year, Count(*) n_enr FROM freq_flyer_enrollment GROUP BY Trunc (start_date, 'YEAR') ), ffs AS ( SELECT Trunc (survey_date, 'YEAR') year, Count(*) n_sur FROM freq_flyer_survey GROUP BY Trunc (survey_date, 'YEAR') ), fff AS ( SELECT Trunc (flt_date, 'YEAR') year, Count(*) n_fly FROM freq_flyer_flights GROUP BY Trunc (flt_date, 'YEAR') ) SELECT yrs.year, Nvl (ffe.n_enr, 0) n_enr, Nvl (ffs.n_sur, 0) n_sur, Nvl (fff.n_fly, 0) n_fly FROM yrs LEFT JOIN ffe ON ffe.year = yrs.year LEFT JOIN ffs ON ffs.year = yrs.year LEFT JOIN fff ON fff.year = yrs.year ORDER BY 1