[There is a recording on this article here: Tweet, and also in my GitHub project, Small SQL projects.]
It’s a common requirement to compute running sums over time in SQL; for example, to find sales volumes to date. This is easy to do using analytic functions, like this, schematically:
SELECT SUM(sales) OVER (PARTITION By partition_key ORDER BY date) FROM ...
The ORDER BY clause implicitly adds a window range of UNBOUNDED PRECEDING, and if you omit it you get the overall totals by partition_key.
Recently I needed to compute the products of some numeric factors across records over time. This kind of requirement often arises in financial calculations, and is slightly more tricky since Oracle doesn’t have an analytic function for products as it does for sums. However, we can achieve the same functionality using the well known mathematical equivalence:
log(xy) = log(x) + log(y)
and therefore:
xy = exp(log(x) + log(y))
More generally, if we have a set of N records , and a function, f, defined on a record, we can write the running sum, for function f, and n in (1..N) as:
Then from the above equivalence, we can write:
This means we can get the running products in SQL using the analytic SUM combined with the (natural) log (LN in Oracle SQL) and exp functions. Let’s see how it works using Oracle’s HR demo schema. We’ll take the employees table and use:
- department_id as the partition key
- employee_id as the dimension to order by
- salary as the measure
To start with, let’s get the running and total sums, showing results for department_id = 60:
SELECT department_id, employee_id, salary, SUM(salary) OVER (PARTITION BY department_id ORDER BY employee_id) running_sum, SUM(salary) OVER (PARTITION BY department_id) total_sum FROM employees ORDER BY department_id, employee_id / Sums DEPARTMENT_ID EMPLOYEE_ID SALARY RUNNING_SUM TOTAL_SUM ------------- ----------- ---------- ----------- ---------- 60 103 9000 9000 28800 60 104 6000 15000 28800 60 105 4800 19800 28800 60 106 4800 24600 28800 60 107 4200 28800 28800
Next, let’s use the above equivalence to get the running and total products of the expression (1 + salary/10000), which we’ll call mult:
SELECT department_id, employee_id, salary, (1 + salary/10000) mult, EXP(SUM(LN((1 + salary/10000))) OVER (PARTITION BY department_id ORDER BY employee_id)) running_prod, EXP(SUM(LN((1 + salary/10000))) OVER (PARTITION BY department_id)) total_prod FROM employees ORDER BY department_id, employee_id / Products DEPARTMENT_ID EMPLOYEE_ID SALARY MULT RUNNING_PROD TOTAL_PROD ------------- ----------- ---------- ---------- ------------ ---------- 60 103 9000 1.9 1.9 9.45551872 60 104 6000 1.6 3.04 9.45551872 60 105 4800 1.48 4.4992 9.45551872 60 106 4800 1.48 6.658816 9.45551872 60 107 4200 1.42 9.45551872 9.45551872
If we didn’t have this technique we could compute the results using explicit recursion, either by MODEL clause, or by recursive subquery factors. Let’s do it those ways out of interest. First here’s a MODEL clause solution:
WITH multipliers AS ( SELECT department_id, employee_id, salary, (1 + salary/10000) mult, COUNT(*) OVER (PARTITION BY department_id) n_emps FROM employees ) SELECT department_id, employee_id, salary, mult, running_prod, total_prod FROM multipliers MODEL PARTITION BY (department_id) DIMENSION BY (Row_Number() OVER (PARTITION BY department_id ORDER BY employee_id) rn) MEASURES (employee_id, salary, mult, mult running_prod, mult total_prod, n_emps) RULES ( running_prod[rn > 1] = mult[CV()] * running_prod[CV() - 1], total_prod[any] = running_prod[n_emps[CV()]] ) ORDER BY department_id, employee_id
Finally, here’s a solution using recursive subquery factors:
WITH multipliers AS ( SELECT department_id, employee_id, salary, (1 + salary/10000) mult, Row_Number() OVER (PARTITION BY department_id ORDER BY employee_id) rn, COUNT(*) OVER (PARTITION BY department_id) n_emps FROM employees WHERE department_id = 60 ), rsf (department_id, employee_id, rn, salary, mult, running_prod) AS ( SELECT department_id, employee_id, rn, salary, mult, mult running_prod FROM multipliers WHERE rn = 1 UNION ALL SELECT m.department_id, m.employee_id, m.rn, m.salary, m.mult, r.running_prod * m.mult FROM rsf r JOIN multipliers m ON m.rn = r.rn + 1 AND m.department_id = r.department_id ) SELECT department_id, employee_id, salary, mult, running_prod, Last_Value(running_prod) OVER (PARTITION BY department_id) total_prod FROM rsf ORDER BY department_id, employee_id
You can see the scripts and full output on my new GitHub project,
Small SQL projects, in the sums_products folder.
You can get the full detail on using analytic functions from the Oracle doc:
SQL for Analysis and Reporting
See also: Analytic and Recursive SQL by Example