A Framework for Dimensional Benchmarking of SQL Query Performance

A few years ago I wanted to investigate the performance of different SQL queries for the same problem, and wanted to include analysis of how the queries’ performance varied with problem size. In order to do this efficiently I wrote an Oracle framework consisting of tables, packages, types etc., and which I have now published here, dim_bench_sql_oracle on GitHub. As well as the obvious cpu and elapsed times, I included statistics contained in the execution plan tables, and also the differences in v$ view statistics that are gathered in the well known Runstats scripts, (originally developed by Tom Kyte, and for which there now seem to be lots of variations around, such as
Runstats utility
). My approach is to collect these statistics in tables keyed by both query and dimensions to allow for more elaborate reporting, and to easily detect unscaleable queries, for example that use resources at a rate that grows quadratically, or worse, with problem size, as in one of the demo queries. Output goes to both a text log file, and to summary csv files for importing to Excel.

This article has design information on the framework, and is best read in conjunction with the first article on its use for a specific problem, Dimensional Benchmarking of Oracle v10-v12 Queries for SQL Bursting Problems, which includes the output log produced. This problem is included in the GitHub project, so anyone interested can run it themselves fairly easily.

Update, 26 November 2016: A notes section has been added discussing design issues and features.

Here is one of the Scribd articles for which I originally developed the framework:

Loading...

Bench Data Model – ERD

bench-1-0-erd

Code Structure Diagram

bench-1-1-csd

Test_Queries Package Call Structure Table

Level 1 Level 2 Level 3 Package
Add_Query Test_Queries
Init_Statistics Test_Queries
Plan_Lines Test_Queries
Get_SQL_Id Utils
Display_Cursor DBMS_XPlan
Write_Plan_Statistics Test_Queries
Get_Queries Test_Queries
Write_Log Utils
Flush_Buf Test_Queries
Init_Time Timer_Set
Put_Line UTL_File
Increment_Time Timer_Set
Write_Line Test_Queries
Flush_Buf Test_Queries
Open_File Test_Queries
Construct Timer_Set
Fopen UTL_File
Close_File Test_Queries
Flush_Buf Test_Queries
FClose UTL_File
Write_Times Timer_Set
Outbound_Interface Test_Queries
Construct Timer_Set
Open_File Test_Queries
Write_Line Test_Queries
Process_Cursor Test_Queries
Write_Log Utils
Init_Statistics Test_Queries
Init_Time Timer_Set
Increment_Time Timer_Set
Write_Line Test_Queries
Flush_Buf Test_Queries
Write_Plan_Statistics Test_Queries
Write_Plan Utils
Write_Times Timer_Set
Get_Timer_Stats Timer_Set
Close_File Test_Queries
Write_Log Utils
Write_Size_list Test_Queries
Write_Log Utils
Write_Twice Test_Queries
Write_Line Test_Queries
Write_CSV_Fields Utils
Write_Data_Points Test_Queries
Heading Utils
Write_Line Test_Queries
Write_Twice Test_Queries
Write_Distinct_Plans Test_Queries
Heading Utils
Write_Log Utils
Write_Rows Test_Queries
Heading Utils
Write_Line Test_Queries
Write_Twice Test_Queries
Write_Rows Test_Queries
Write_Stat Test_Queries
Write_Log Utils
Write_Rows Test_Queries
Write_All_Facts Test_Queries
Write_Distinct_Plans Test_Queries
Write_Data_Points Test_Queries
Write_Rows Test_Queries
Write_Stat Test_Queries
Heading Utils
Close_File Test_Queries
Write_Stats Test_Queries
Open_File Test_Queries
Write_All_Facts Test_Queries
Write_Log Utils
Term_Run Test_Queries
Write_Times Timer_Set
Get_Run_Details Test_Queries
Construct Timer_Set
Create_Log Utils
Write_Log Utils
Heading Utils
Write_Size_list Test_Queries
Get_Queries Test_Queries
Set_Data_Point Test_Queries
Init_Time Timer_Set
Get_CPU_Time DBMS_Utility
Setup_Data Query_Test_Set
Increment_Time Timer_Set
Run_One Test_Queries
Construct Timer_Set
Outbound_Interface Test_Queries
Write_Log Utils
Write_Times Timer_Set
Create_Run Test_Queries
Write_Log Utils
Main Test_Queries
Main Test_Queries
Main Test_Queries
Get_Run_Details Test_Queries
Set_Data_Point Test_Queries
Run_One Test_Queries
Increment_Time Timer_Set
Write_Stats Test_Queries
Write_Log Utils
Write_Other_Error Utils

 
Outputs

out folder

The demo script Test_Bur.sql writes the log data to ..\out\Test_Bur.LST.

The program loops over each (W, D) data point included in the driving script lists, and outputs for each query in the group data including:

  • Full execution plan
  • CPU and elapsed timings of query execution steps, file writing and data setup

On completion of the loops, summary reports are written to both the main log and to summary csv files, mentioned below, with information including

  • Distinct execution plans (main log only)
  • Data point statistics including setup timings and records created
  • Numbers of output records from the queries
  • cpu and elapsed times for the queries
  • Execution plan statistics
  • Numerous v$ After-Before statistic differences (following the Run_Stats model)

Oracle directory: output_dir

For a query group and query with data points W-D, the results of running the query are written to:

<query_group>_<query>_<W>-<D>.csv

For example:

BURST_MTH_QRY_30-30

Two summary files are written, with the bench run id as part of the name:

<bench run id>_W.csv
<bench run id>_D.csv

These files contain all the detailed statistics in csv format, so that they can be imported into Excel and used to obtain graphs. _W has the width parameter as the row and depth as the column, and _D the other way round.

Notes

Query Timing

Obtaining reliable comparative timings of queries in a test environment is not as straightforward as it may seem. Some of the issues are considered in this article, for example, Timing an ALL_ROWS query. For ad hoc tests, running the query in SQL*Plus after ‘SET AUTOTRACE TRACEONLY’ is one of the better approaches. However, in this framework a different approach is taken in order to simulate the performance that might be obtained if the query records were fetched in batches to be processed, say as an outbound interface, where they may be written to a file on the server. To do this, the query select list is converted into CSV format and the records are written to a file, with care taken to separate the timings of the query operations from those of the file processing.

Hard Parsing

This framework is not intended for testing OLTP SQL but relatively long-running batch-type SQL, where the cost of parsing is generally negligible. As the dataset sizes vary it is possible that the execution plan may vary, so it is important that the SQL engine performs a hard-parse on each execution of a query to ensure plan re-calculation. A hard parse is ensured by appending a placeholder field into the select list CSV string of the transformed queries, which is then replaced before each execution by a random number: The SQL engine considers the queries then to be distinct and therefore re-parses them.

Code Timing

The processing within the framework is heavily instrumented using the author’s own code timing utility package Code Timing and Object Orientation and Zombies. This is very low footprint in terms both of code and of performance, operating entirely in memory with individual timers keyed by name, and (logically) object oriented so that multiple timer sets can be running at once. Timings are printed to log, and the cpu and elapsed times for the query executed are summed from the individual components for the query, together with the times for any pre-query step:

  • Pre SQL
  • Open cursor
  • First fetch
  • Remaining fetches

Benchmarking Non-Query SQL

The framework is centred around the concept of a group of queries that are run in turn against the same dataset for each dataset point. However, non-query SQL can also be benchmarked in two ways: First, the query can include database PL/SQL functions; and secondly, the query metadata record includes a clob field for pre-query SQL that can be a PL/SQL block, while the actual query could just be ‘select 1 from dual’.

Query Transformation by Regular Expression Processing

The query output is written to file in csv format, includes the hint GATHER_PLAN_STATISTICS, and has a placeholder for a random number. Rather than cluttering up the input queries with this formatting, it seemed better to have the framework do the formatting. To this end the input queries instead have a select list with individual expressions and (mandatory) aliases, which can be simple or can be in double-quotes. The aliases form the header line of the csv file. To facilitate formatting the main query select list has to be of the form:

SELECT 
/* SEL */
        expr_1          alias_1,
        expr_2          alias_2
/* SEL */

Each expression must be on a separate line, and the list must be delimited by comment lines /* SEL */ as shown. The query formatting is performed in a procedure Get_Queries using some fairly complex regular expression processing.

Statistic Output Formatting

The various kinds of statistic (basic timing, execution plan aggregates, v$ statistics) are generally output in matrix format, both WxD and DxW. First the base numbers are printed for the whole grid for each query; then the last row for each query is printed, the deep or wide ‘slice’; then the same two sets of output are printed for the ratios of each number compared to the smallest number at the same data point across all queries.

Execution Plan Aggregation

After executing a query the execution plan statistics are copied from the system view v$sql_plan_statistics_all into bench_run_v$sql_plan_stats_all, and the formatted plan is written to a nested varray in bench_run_statistics using DBMS_XPlan.Display_Cursor.

At the end useful statistics in the plans are printed in aggregate by query execution, including maximum values of memory used and disk reads and writes, etc.

Estimated vs Actual Cardinalities

Oracle’s Cost Based Optimizer (CBO) uses estimated cardinalities at each step to evaluate candidate execution plans, and using the hint GATHER_PLAN_STATISTICS causes the actual cardinalities to be collected. Differences between estimated and actuals are generally recognised as being an important factor in whether or not a ‘good’ plan is chosen, so the maximum difference is included in the aggregates printed.

V$ Statistics

The statistics in the system views v$mystat, v$latch, v$sess_time_model are written to bench_run_v$stats before query execution (value_before, wait_before) and after execution (value_after, wait_after).

At the end a selection of the (after – before) differences of these statistics is written to log and csv file in the same format as the other statistics, based on the variance across the queries at the highest data point. A simple heuristic is included in the reporting query to restrict the statistics written to those deemed of most interest in comparing the queries, but all of the statistics remain available in bench_run_v$stats for ad hoc querying if desired.

See Also…






SQL for the Balanced Number Partitioning Problem

I noticed a post on AskTom recently that referred to an SQL solution to a version of the so-called Bin Fitting problem, where even distribution is the aim. The solution, How do I solve a Bin Fitting problem in an SQL statement?, uses Oracle’s Model clause, and, as the poster of the link observed, has the drawback that the number of bins is embedded in the query structure. I thought it might be interesting to find solutions without that drawback, so that the number of bins could be passed to the query as a bind variable. I came up with three solutions using different techniques, starting here.

An interesting article in American Scientist, The Easiest Hard Problem, notes that the problem is NP-complete, or certifiably hard, but that simple greedy heuristics often produce a good solution, including one used by schoolboys to pick football teams. The article uses the more descriptive term for the problem of balanced number partitioning, and notes some practical applications. The Model clause solution implements a multiple-bin version of the main Greedy Algorithm, while my non-Model SQL solutions implement variants of it that allow other techniques to be used, one of which is very simple and fast: this implements the team picking heuristic for multiple teams.

Another poster, Stew Ashton, suggested a simple change to my Model solution that improved performance, and I use this modified version here. He also suggested that using PL/SQL might be faster, and I have added my own simple PL/SQL implementation of the Greedy Algorithm, as well as a second version of the recursive subquery factoring solution that performs better than the first.

This article explains the solutions, considers two simple examples to illustrate them, and reports on performance testing across dimensions of number of items and number of bins. These show that the solutions exhibit either linear or quadratic variation in execution time with number of items, and some methods are sensitive to the number of bins while others are not.

After I had posted my solutions on the AskTom thread, I came across a thread on OTN, need help to resolve this issue, that requested a solution to a form of bin fitting problem where the bins have fixed capacity and the number of bins required must be determined. I realised that my solutions could easily be extended to add that feature, and posted extended versions of two of the solutions there. I have added a section here for this.

Updated, 5 June 2013: added Model and RSF diagrams

Update, 18 November 2017: I have now put scripts for setting up data and running the queries in a new schema onto my GitHub project: Brendan’s repo for interesting SQL. Note that I have not included the minor changes needed for the extended problem where findiing the number of bins is part of the problem.

Greedy Algorithm Variants

Say there are N bins and M items.

Greedy Algorithm (GDY)
Set bin sizes zero
Loop over items in descending order of size

  • Add item to current smallest bin
  • Calculate new bin size

End Loop

Greedy Algorithm with Batched Rebalancing (GBR)
Set bin sizes zero
Loop over items in descending order of size in batches of N items

  • Assign batch to N bins, with bins in ascending order of size
  • Calculate new bin sizes

End Loop

Greedy Algorithm with No Rebalancing – or, Team Picking Algorithm (TPA)
Assign items to bins cyclically by bin sequence in descending order of item size

Two Examples

Example: Four Items
Binfit, v1.3 - 4-items
Here we see that the Greedy Algorithm finds the perfect solution, with no difference in bin size, but the two variants have a difference of two.
Example: Six Items
Binfit, v1.3 - 6-items
Here we see that none of the algorithms finds the perfect solution. Both the standard Greedy Algorithm and its batched variant give a difference of two, while the variant without rebalancing gives a difference of four.

SQL Solutions

Original Model for GDY
See the link above for the SQL for the problem with three bins only.

The author has two measures for each bin and implements the GDY algorithm using CASE expressions and aggregation within the rules. The idea is to iterate over the items in descending order of size, setting the item bin to the bin with current smallest value. I use the word ‘bin’ for his ‘bucket’. Some notes:

  • Dimension by row number, ordered by item value
  • Add measures for the iteration, it, and number of iterations required, counter
  • Add measures for the bin name, bucket_name, and current minimum bin value, min_tmp (only first entry used)
  • Add measures for each item bin value, bucket_1-3, being the item value if it’s in that bin, else zero
  • Add measures for each bin running sum, pbucket_1-3, being the current value of each bin (only first two entries used)
  • The current minimum bin value, bin_tmp[1] is computed as the least of the running sums
  • The current item bin value is set to the item value for the bin whose value matches the minimum just computed, and null for the others
  • The current bin name is set similarly to be the bin matching the minimum
  • The new running sums are computed for each bin

Brendan’s Generic Model for GDY

SELECT item_name, bin, item_value, Max (bin_value) OVER (PARTITION BY bin) bin_value
  FROM (
SELECT * FROM items
  MODEL 
    DIMENSION BY (Row_Number() OVER (ORDER BY item_value DESC) rn)
    MEASURES (item_name, 
              item_value,
              Row_Number() OVER (ORDER BY item_value DESC) bin,
              item_value bin_value,
              Row_Number() OVER (ORDER BY item_value DESC) rn_m,
              0 min_bin,
              Count(*) OVER () - :N_BINS - 1 n_iters
    )
    RULES ITERATE(100000) UNTIL (ITERATION_NUMBER >= n_iters[1]) (
      min_bin[1] = Min(rn_m) KEEP (DENSE_RANK FIRST ORDER BY bin_value)[rn <= :N_BINS],
      bin[ITERATION_NUMBER + :N_BINS + 1] = min_bin[1],
      bin_value[min_bin[1]] = bin_value[CV()] + Nvl (item_value[ITERATION_NUMBER + :N_BINS + 1], 0)
    )
)
 WHERE item_name IS NOT NULL
 ORDER BY item_value DESC

My Model solution works for any number of bins, passing the number of bins as a bind variable. The key idea here is to use values in the first N rows of a generic bin value measure to store all the running bin values, rather than as individual measures. I have included two modifications suggested by Stew in the AskTom thread.

  • Dimension by row number, ordered by item value
  • Initialise a bin measure to the row number (the first N items will remain fixed)
  • Initialise a bin value measure to item value (only first N entries used)
  • Add the row number as a measure, rn_m, in addition to a dimension, for referencing purposes
  • Add a min_bin measure for current minimum bin index (first entry only)
  • Add a measure for the number of iterations required, n_iters
  • The first N items are correctly binned in the measure initialisation
  • Set the minimum bin index using analytic Min function with KEEP clause over the first N rows of bin value
  • Set the bin for the current item to this index
  • Update the bin value for the corresponding bin only

Binfit, v1.3 - MOD

Recursive Subquery Factor for GBR

WITH bins AS (
       SELECT LEVEL bin, :N_BINS n_bins FROM DUAL CONNECT BY LEVEL <= :N_BINS
), items_desc AS (
       SELECT item_name, item_value, Row_Number () OVER (ORDER BY item_value DESC) rn
         FROM items
), rsf (bin, item_name, item_value, bin_value, lev, bin_rank, n_bins) AS (
SELECT b.bin,
       i.item_name, 
       i.item_value, 
       i.item_value,
       1,
       b.n_bins - i.rn + 1,
       b.n_bins
  FROM bins b
  JOIN items_desc i
    ON i.rn = b.bin
 UNION ALL
SELECT r.bin,
       i.item_name, 
       i.item_value, 
       r.bin_value + i.item_value,
       r.lev + 1,
       Row_Number () OVER (ORDER BY r.bin_value + i.item_value),
       r.n_bins
  FROM rsf r
  JOIN items_desc i
    ON i.rn = r.bin_rank + r.lev * r.n_bins
)
SELECT r.item_name,
       r.bin, r.item_value, r.bin_value
  FROM rsf r
 ORDER BY item_value DESC

The idea here is to use recursive subquery factors to iterate through the items in batches of N items, assigning each item to a bin according to the rank of the bin on the previous iteration.

  • Initial subquery factors form record sets for the bins and for the items with their ranks in descending order of value
  • The anchor branch assign bins to the first N items, assigning the item values to a bin value field, and setting the bin rank in ascending order of this bin value
  • The recursive branch joins the batch of items to the record in the previous batch whose bin rank matches that of the item in the reverse sense (so largest item goes to smallest bin etc.)
  • The analytic Row_Number function computes the updated bin ranks, and the bin values are updated by simple addition

Binfit, v1.3 - RSF

Recursive Subquery Factor for GBR with Temporary Table
Create Table and Index

DROP TABLE items_desc_temp
/
CREATE GLOBAL TEMPORARY TABLE items_desc_temp (
   item_name  VARCHAR2(30) NOT NULL,  
   item_value NUMBER(8) NOT NULL,
   rn         NUMBER
)
ON COMMIT DELETE ROWS
/
CREATE INDEX items_desc_temp_N1 ON items_desc_temp (rn)
/

Insert into Temporary Table

INSERT INTO items_desc_temp
SELECT item_name, item_value, Row_Number () OVER (ORDER BY item_value DESC) rn
  FROM items;

RSF Query with Temporary Table

WITH bins AS (
       SELECT LEVEL bin, :N_BINS n_bins FROM DUAL CONNECT BY LEVEL <= :N_BINS
), rsf (bin, item_name, item_value, bin_value, lev, bin_rank, n_bins) AS (
SELECT b.bin,
       i.item_name, 
       i.item_value, 
       i.item_value,
       1,
       b.n_bins - i.rn + 1,
       b.n_bins
  FROM bins b
  JOIN items_desc_temp i
    ON i.rn = b.bin
 UNION ALL
SELECT r.bin,
       i.item_name, 
       i.item_value, 
       r.bin_value + i.item_value,
       r.lev + 1,
       Row_Number () OVER (ORDER BY r.bin_value + i.item_value),
       r.n_bins
  FROM rsf r
  JOIN items_desc_temp i
    ON i.rn = r.bin_rank + r.lev * r.n_bins
)
SELECT item_name, bin, item_value, bin_value
  FROM rsf
 ORDER BY item_value DESC

The idea here is that in the initial RSF query a subquery factor of items was joined on a calculated field, so the whole record set had to be read, and performance could be improved by putting that initial record set into an indexed temporary table ahead of the main query. We'll see in the performance testing section that this changes quadratic variation with problem size into linear variation.

Plain Old SQL Solution for TPA

WITH items_desc AS (
       SELECT item_name, item_value, 
              Mod (Row_Number () OVER (ORDER BY item_value DESC), :N_BINS) + 1 bin
         FROM items
)
SELECT item_name, bin, item_value, Sum (item_value) OVER (PARTITION BY bin) bin_total
  FROM items_desc
 ORDER BY item_value DESC

The idea here is that the TPA algorithm can be implemented in simple SQL using analyic functions.

  • The subquery factor assigns the bins by taking the item rank in descending order of value and applying the modulo (N) function
  • The main query returns the bin totals in addition by analytic summing by bin

Pipelined Function for GDY
Package

CREATE OR REPLACE PACKAGE Bin_Fit AS

TYPE bin_fit_rec_type IS RECORD (item_name VARCHAR2(100), item_value NUMBER, bin NUMBER);
TYPE bin_fit_list_type IS VARRAY(1000) OF bin_fit_rec_type;

TYPE bin_fit_cur_rec_type IS RECORD (item_name VARCHAR2(100), item_value NUMBER);
TYPE bin_fit_cur_type IS REF CURSOR RETURN bin_fit_cur_rec_type;

FUNCTION Items_Binned (p_items_cur bin_fit_cur_type, p_n_bins PLS_INTEGER) RETURN bin_fit_list_type PIPELINED;

END Bin_Fit;
/
CREATE OR REPLACE PACKAGE BODY Bin_Fit AS

c_big_value                 CONSTANT NUMBER := 100000000;
TYPE bin_fit_cur_list_type  IS VARRAY(100) OF bin_fit_cur_rec_type;

FUNCTION Items_Binned (p_items_cur bin_fit_cur_type, p_n_bins PLS_INTEGER) RETURN bin_fit_list_type PIPELINED IS

  l_min_bin              PLS_INTEGER := 1;
  l_min_bin_val             NUMBER;
  l_bins                    SYS.ODCINumberList := SYS.ODCINumberList();
  l_bin_fit_cur_rec         bin_fit_cur_rec_type;
  l_bin_fit_rec             bin_fit_rec_type;
  l_bin_fit_cur_list        bin_fit_cur_list_type;

BEGIN

  l_bins.Extend (p_n_bins);
  FOR i IN 1..p_n_bins LOOP
    l_bins(i) := 0;
  END LOOP;

  LOOP

    FETCH p_items_cur BULK COLLECT INTO l_bin_fit_cur_list LIMIT 100;
    EXIT WHEN l_bin_fit_cur_list.COUNT = 0;

    FOR j IN 1..l_bin_fit_cur_list.COUNT LOOP

      l_bin_fit_rec.item_name := l_bin_fit_cur_list(j).item_name;
      l_bin_fit_rec.item_value := l_bin_fit_cur_list(j).item_value;
      l_bin_fit_rec.bin := l_min_bin;

      PIPE ROW (l_bin_fit_rec);
      l_bins(l_min_bin) := l_bins(l_min_bin) + l_bin_fit_cur_list(j).item_value;

      l_min_bin_val := c_big_value;
      FOR i IN 1..p_n_bins LOOP

        IF l_bins(i) < l_min_bin_val THEN
          l_min_bin := i;
          l_min_bin_val := l_bins(i);
        END IF;

      END LOOP;

    END LOOP;

  END LOOP;

END Items_Binned;

SQL Query

SELECT item_name, bin, item_value, Sum (item_value) OVER (PARTITION BY bin) bin_value
  FROM TABLE (Bin_Fit.Items_Binned (
                     CURSOR (SELECT item_name, item_value FROM items ORDER BY item_value DESC), 
                     :N_BINS))
 ORDER BY item_value DESC

The idea here is that procedural algorithms can often be implemented more efficiently in PL/SQL than in SQL.

  • The first parameter to the function is a strongly-typed reference cursor
  • The SQL call passes in a SELECT statement wrapped in the CURSOR keyword, so the function can be used for any set of records that returns name and numeric value pairs
  • The item records are fetched in batches of 100 using the LIMIT clause to improves efficiency

Performance Testing
I tested performance of the various queries using my own benchmarking framework across grids of data points, with two data sets to split the queries into two sets based on performance.

Query Modifications for Performance Testing

  • The RSF query with staging table was run within a pipelined function in order to easily include the insert in the timings
  • A system context was used to pass the bind variables as the framework runs the queries from PL/SQL, not from SQL*Plus
  • I found that calculating the bin values using analytic sums, as in the code above, affected performance, so I removed this for clarity of results, outputting only item name, value and bin

Test Data Sets
For a given depth parameter, d, random numbers were inserted within the range 0-d for d-1 records. The insert was:

 INSERT INTO items
  SELECT 'item-' || n, DBMS_Random.Value (0, p_point_deep) FROM  
  (SELECT LEVEL n FROM DUAL CONNECT BY LEVEL < p_point_deep);

The number of bins was passed as a width parameter, but note that the original, linked Model solution, MODO, hard-codes the number of bins to 3.

Test Results

Data Set 1 - Small
This was used for the following queries:

  • MODO - Original Model for GDY
  • MODB - Brendan's Generic Model for GDY
  • RSFQ - Recursive Subquery Factor for GBR
 Depth         W3         W3         W3
Run Type=MODO
 D1000       1.03       1.77       1.05
 D2000       3.98       6.46       5.38
 D4000      15.79       20.7      25.58
 D8000      63.18      88.75      92.27
D16000      364.2     347.74     351.99
Run Type=MODB
 Depth         W3         W6        W12
 D1000        .27        .42        .27
 D2000          1       1.58       1.59
 D4000       3.86        3.8       6.19
 D8000      23.26      24.57      17.19
D16000      82.29      92.04      96.02
Run Type=RSFQ
 D1000       3.24       3.17       1.53
 D2000       8.58       9.68       8.02
 D4000      25.65      24.07      23.17
 D8000      111.3     108.25      98.33
D16000     471.17     407.65     399.99

Slice W3
The results show:

  • Quadratic variation of CPU time with number of items
  • Little variation of CPU time with number of bins, although RSFQ seems to show some decline
  • RSFQ is slightly slower than MODO, while my version of Model, MODB is about 4 times faster than MODO

Data Set 2 - Large
This was used for the following queries:

  • RSFT - Recursive Subquery Factor for GBR with Temporary Table
  • POSS - Plain Old SQL Solution for TPA
  • PLFN - Pipelined Function for GDY

This table gives the CPU times in seconds across the data set:

  Depth       W100      W1000     W10000
Run Type=PLFN
 D20000        .31       1.92      19.25
 D40000        .65       3.87      55.78
 D80000       1.28       7.72      92.83
D160000       2.67      16.59     214.96
D320000       5.29      38.68      418.7
D640000      11.61      84.57      823.9
Run Type=POSS
 D20000        .09        .13        .13
 D40000        .18        .21        .18
 D80000        .27        .36         .6
D160000        .74       1.07        .83
D320000       1.36       1.58       1.58
D640000       3.13       3.97       4.04
Run Type=RSFT
 D20000        .78        .78        .84
 D40000       1.41       1.54        1.7
 D80000       3.02       3.39       4.88
D160000       6.11       9.56       8.42
D320000      13.05      18.93      20.84
D640000      41.62      40.98      41.09

Slice W100

Slice W10000
The results show:

  • Linear variation of CPU time with number of items
  • Little variation of CPU time with number of bins for POSS and RSFT, but roughly linear variation for PLFN
  • These linear methods are much faster than the earlier quadratic ones for larger numbers of items
  • Its approximate proportionality of time to number of bins means that, while PLFN is faster than RSFT for small number of bins, it becomes slower from around 50 bins for our problem
  • The proportionality to number of bins for PLFN presumably arises from the step to find the bin of minimum value
  • The lack of proportionality to number of bins for RSFT may appear surprising since it performs a sort of the bins iteratively: However, while the work for this sort is likely to be proportional to the number of bins, the number of iterations is inversely proportional and thus cancels out the variation

Solution Quality

The methods reported above implement three underlying algorithms, none of which guarantees an optimal solution. In order to get an idea of how the quality compares, I created new versions of the second set of queries using analytic functions to output the difference between minimum and maximum bin values, with percentage of the maximum also output. I ran these on the same grid, and report below the results for the four corners.

Method:			PLFN		RSFT		POSS
Point:	W100/D20000
Diff/%:			72/.004%	72/.004%	19,825/1%
Point:	W100/D640000
Diff/%:			60/.000003%	60/.000003%	633499/.03%
Point:	W10000/D20000
Diff/%:			189/.9%		180/.9%		19,995/67%
Point:	W10000/D640000
Diff/%:			695/.003%	695/.003%	639,933/3%

The results indicate that GDY (Greedy Algorithm) and GBR (Greedy Algorithm with Batched Rebalancing) generally give very similar quality results, while TPA (Team Picking Algorithm) tends to be quite a lot worse.

Extended Problem: Finding the Number of Bins Required

An important extension to the problem is when the bins have fixed capacity, and it is desired to find the minimum number of bins, then spread the items evenly between them. As mentioned at the start, I posted extensions to two of my solutions on an OTN thread, and I reproduce them here. It turns out to be quite easy to make the extension. The remainder of this section is just lifted from my OTN post and refers to the table of the original poster.

Start OTN Extract
So how do we determine the number of bins? The total quantity divided by bin capacity, rounded up, gives a lower bound on the number of bins needed. The actual number required may be larger, but mostly it will be within a very small range from the lower bound, I believe (I suspect it will nearly always be the lower bound). A good practical solution, therefore, would be to compute the solutions for a base number, plus one or more increments, and this can be done with negligible extra work (although Model might be an exception, I haven't tried it). Then the bin totals can be computed, and the first solution that meets the constraints can be used. I took two bin sets here.

SQL POS

WITH items AS (
       SELECT sl_pm_code item_name, sl_wt item_amt, sl_qty item_qty,
              Ceil (Sum(sl_qty) OVER () / :MAX_QTY) n_bins
         FROM ow_ship_det
), items_desc AS (
       SELECT item_name, item_amt, item_qty, n_bins,
              Mod (Row_Number () OVER (ORDER BY item_qty DESC), n_bins) bin_1,
              Mod (Row_Number () OVER (ORDER BY item_qty DESC), n_bins + 1) bin_2
         FROM items
)
SELECT item_name, item_amt, item_qty, 
       CASE bin_1 WHEN 0 THEN n_bins ELSE bin_1 END bin_1, 
       CASE bin_2 WHEN 0 THEN n_bins + 1 ELSE bin_2 END bin_2, 
       Sum (item_amt) OVER (PARTITION BY bin_1) bin_1_amt,
       Sum (item_qty) OVER (PARTITION BY bin_1) bin_1_qty,
       Sum (item_amt) OVER (PARTITION BY bin_2) bin_2_amt,
       Sum (item_qty) OVER (PARTITION BY bin_2) bin_2_qty
  FROM items_desc
 ORDER BY item_qty DESC, bin_1, bin_2

SQL Pipelined

SELECT osd.sl_pm_code item_name, osd.sl_wt item_amt, osd.sl_qty item_qty, 
       tab.bin_1, tab.bin_2, 
       Sum (osd.sl_wt) OVER (PARTITION BY tab.bin_1) bin_1_amt,
       Sum (osd.sl_qty) OVER (PARTITION BY tab.bin_1) bin_1_qty,
       Sum (osd.sl_wt) OVER (PARTITION BY tab.bin_2) bin_2_amt,
       Sum (osd.sl_qty) OVER (PARTITION BY tab.bin_2) bin_2_qty
  FROM ow_ship_det osd
  JOIN TABLE (Bin_Even.Items_Binned (
                     CURSOR (SELECT sl_pm_code item_name, sl_qty item_value,
                                    Sum(sl_qty) OVER () item_total
                               FROM ow_ship_det
                              ORDER BY sl_qty DESC, sl_wt DESC),
                     :MAX_QTY)) tab
    ON tab.item_name = osd.sl_pm_code
 ORDER BY osd.sl_qty DESC, tab.bin_1

Pipelined Function

CREATE OR REPLACE PACKAGE Bin_Even AS

TYPE bin_even_rec_type IS RECORD (item_name VARCHAR2(100), item_value NUMBER, bin_1 NUMBER, bin_2 NUMBER);
TYPE bin_even_list_type IS VARRAY(1000) OF bin_even_rec_type;

TYPE bin_even_cur_rec_type IS RECORD (item_name VARCHAR2(100), item_value NUMBER, item_total NUMBER);
TYPE bin_even_cur_type IS REF CURSOR RETURN bin_even_cur_rec_type;

FUNCTION Items_Binned (p_items_cur bin_even_cur_type, p_bin_max NUMBER) RETURN bin_even_list_type PIPELINED;

END Bin_Even;
/
SHO ERR
CREATE OR REPLACE PACKAGE BODY Bin_Even AS

c_big_value                 CONSTANT NUMBER := 100000000;
c_n_bin_sets                CONSTANT NUMBER := 2;

TYPE bin_even_cur_list_type IS VARRAY(100) OF bin_even_cur_rec_type;
TYPE num_lol_list_type      IS VARRAY(100) OF SYS.ODCINumberList;

FUNCTION Items_Binned (p_items_cur bin_even_cur_type, p_bin_max NUMBER) RETURN bin_even_list_type PIPELINED IS

  l_min_bin                 SYS.ODCINumberList := SYS.ODCINumberList (1, 1);
  l_min_bin_val             SYS.ODCINumberList := SYS.ODCINumberList (c_big_value, c_big_value);
  l_bins                    num_lol_list_type := num_lol_list_type (SYS.ODCINumberList(), SYS.ODCINumberList());

  l_bin_even_cur_rec        bin_even_cur_rec_type;
  l_bin_even_rec            bin_even_rec_type;
  l_bin_even_cur_list       bin_even_cur_list_type;

  l_n_bins                  PLS_INTEGER;
  l_n_bins_base             PLS_INTEGER;
  l_is_first_fetch          BOOLEAN := TRUE;

BEGIN

  LOOP

    FETCH p_items_cur BULK COLLECT INTO l_bin_even_cur_list LIMIT 100;
    EXIT WHEN l_Bin_Even_cur_list.COUNT = 0;
    IF l_is_first_fetch THEN

      l_n_bins_base := Ceil (l_Bin_Even_cur_list(1).item_total / p_bin_max) - 1;

      l_is_first_fetch := FALSE;

      l_n_bins := l_n_bins_base;
      FOR i IN 1..c_n_bin_sets LOOP

        l_n_bins := l_n_bins + 1;
        l_bins(i).Extend (l_n_bins);
        FOR k IN 1..l_n_bins LOOP
          l_bins(i)(k) := 0;
        END LOOP;

      END LOOP;

    END IF;

    FOR j IN 1..l_Bin_Even_cur_list.COUNT LOOP

      l_bin_even_rec.item_name := l_bin_even_cur_list(j).item_name;
      l_bin_even_rec.item_value := l_bin_even_cur_list(j).item_value;
      l_bin_even_rec.bin_1 := l_min_bin(1);
      l_bin_even_rec.bin_2 := l_min_bin(2);

      PIPE ROW (l_bin_even_rec);

      l_n_bins := l_n_bins_base;
      FOR i IN 1..c_n_bin_sets LOOP
        l_n_bins := l_n_bins + 1;
        l_bins(i)(l_min_bin(i)) := l_bins(i)(l_min_bin(i)) + l_Bin_Even_cur_list(j).item_value;

        l_min_bin_val(i) := c_big_value;
        FOR k IN 1..l_n_bins LOOP

          IF l_bins(i)(k) < l_min_bin_val(i) THEN
            l_min_bin(i) := k;
            l_min_bin_val(i) := l_bins(i)(k);
          END IF;

        END LOOP;

      END LOOP;

    END LOOP;

  END LOOP;

END Items_Binned;

END Bin_Even;

Output POS
Note BIN_1 means bin set 1, which turns out to have 4 bins, while bin set 2 then necessarily has 5.

ITEM_NAME         ITEM_AMT   ITEM_QTY      BIN_1      BIN_2  BIN_1_AMT  BIN_1_QTY  BIN_2_AMT  BIN_2_QTY
--------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
1239606-1080          4024        266          1          1      25562        995      17482        827
1239606-1045          1880        192          2          2      19394        886      14568        732
1239606-1044          1567        160          3          3      18115        835      14097        688
1239606-1081          2118        140          4          4      18988        793      17130        657
1239606-2094          5741         96          1          5      25562        995      18782        605
...
1239606-2107            80          3          4          2      18988        793      14568        732
1239606-2084           122          3          4          3      18988        793      14097        688
1239606-2110           210          2          2          3      19394        886      14097        688
1239606-4022           212          2          3          4      18115        835      17130        657
1239606-4021           212          2          4          5      18988        793      18782        605

Output Pipelined

ITEM_NAME         ITEM_AMT   ITEM_QTY      BIN_1      BIN_2  BIN_1_AMT  BIN_1_QTY  BIN_2_AMT  BIN_2_QTY
--------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
1239606-1080          4024        266          1          1      20627        878      15805        703
1239606-1045          1880        192          2          2      18220        877      16176        703
1239606-1044          1567        160          3          3      20425        878      15651        701
1239606-1081          2118        140          4          4      22787        876      14797        701
1239606-2094          5741         96          4          5      22787        876      19630        701
...
1239606-2089            80          3          4          1      22787        876      15805        703
1239606-2112           141          3          4          2      22787        876      16176        703
1239606-4022           212          2          1          1      20627        878      15805        703
1239606-4021           212          2          2          1      18220        877      15805        703
1239606-2110           210          2          3          2      20425        878      16176        703

End OTN Extract

Conclusions

  • Various solutions for the balanced number partitioning problem have been presented, using Oracle's Model clause, Recursive Subquery Factoring, Pipelined Functions and simple SQL
  • The performance characteristics of these solutions have been tested across a range of data sets
  • As is often the case, the best solution depends on the shape and size of the data set
  • A simple extension has been shown to allow determining the number of bins required in the bin-fitting interpretation of the problem
  • Replacing a WITH clause with a staging table can be a useful technique to allow indexed scans

Get the code here: Brendan's repo for interesting SQL






Notes on Profiling Oracle PL/SQL

‘Everything should be made as simple as possible, but not simpler’

This phrase is often attributed to Albert Einstein, although the attribution is apparently questionable:
Everything Should Be Made as Simple as Possible, But Not Simpler. In any case it’s not a bad approach to follow, even if the quote did come from a non-Oracle guy :).

I recently started looking at the hierarchical profiler tool with a view to using it in an upcoming project. In order to understand the tool properly, I felt it would be a good idea to start by using it to profile a test program that would be as simple as possible while covering as wide a range of scenarios as possible. This article documents the results of that profiling, highlighting the different scenarios covered, discusses the output from the profiler, and includes a query I wrote to display the function call tree.

The article goes on to illustrate profiling through manual code instrumentation, and by the old flat profiler (DBMS_Profiler) on the same test program, concluding that each method has its own strengths and weaknesses.

Setup
The hierarchical profiler setup and use is described in Oracle® Database Advanced Application Developer’s Guide 11g Release 2 (11.2), and some code snippets are available here:PL/SQL Hierarchical Profiler in Oracle Database 11g Release 1

Scenarios
The test program consists of a driving script, Test_Rep_p.sql (attached), that calls a package (HProf_Test) and an object type (Table_Count_Type), both defined in the attached script, HProf_Test_Code.sql. The test program covers the following scenarios:

  • Multiple root calls (__plsql_vm, A_CALLS_B)
  • Recursive procedure calls (procedure calling itself: R_CALLS_R)
  • Mutually recursive procedure calls (procedures call each other: A_CALLS_B and B_CALLS_A)
  • Procedure called by multiple procedures (child with multiple parents: PUT_LINE)
  • Procedure ‘inlined’ within PL/SQL (Rest_a_While)
  • Static SQL within PL/SQL (__static_sql_exec_line8)
  • Dynamic SQL within PL/SQL (__dyn_sql_exec_line12)
  • ‘Everything should be made as simple as possible, but not simpler’

    This phrase is often attributed to Albert Einstein, although the attribution is apparently questionable:
    Everything Should Be Made as Simple as Possible, But Not Simpler. In any case it’s not a bad approach to follow, even if the quote did come from a non-Oracle guy :).

    I recently started looking at the hierarchical profiler tool with a view to using it in an upcoming project. In order to understand the tool properly, I felt it would be a good idea to start by using it to profile a test program that would be as simple as possible while covering as wide a range of scenarios as possible. This article documents the results of that profiling, highlighting the different scenarios covered, discusses the output from the profiler, and includes a query I wrote to display the function call tree.

    The article goes on to illustrate profiling through manual code instrumentation, and by the old flat profiler (DBMS_Profiler) on the same test program, concluding that each method has its own strengths and weaknesses.

    Setup
    The hierarchical profiler setup and use is described in Oracle® Database Advanced Application Developer’s Guide 11g Release 2 (11.2), and some code snippets are available here:PL/SQL Hierarchical Profiler in Oracle Database 11g Release 1

    Scenarios
    The test program consists of a driving script, Test_Rep_p.sql (attached), that calls a package (HProf_Test) and an object type (Table_Count_Type), both defined in the attached script, HProf_Test_Code.sql. The test program covers the following scenarios:

    • Multiple root calls (__plsql_vm, A_CALLS_B)
    • Recursive procedure calls (procedure calling itself: R_CALLS_R)
    • Mutually recursive procedure calls (procedures call each other: A_CALLS_B and B_CALLS_A)
    • Procedure called by multiple procedures (child with multiple parents: PUT_LINE)
    • Procedure ‘inlined’ within PL/SQL (Rest_a_While)
    • Static SQL within PL/SQL (__static_sql_exec_line8)
    • Dynamic SQL within PL/SQL (__dyn_sql_exec_line12)
    • Database function called from SQL in SQL*Plus (DBFUNC)
    • Database function called from SQL in PL/SQL (DBFUNC)
    • Object constructor call (TABLE_COUNT_TYPE)

    Call Structure Diagram
    HProf - CSD

    Raw Results
    The attached script Test_Rep_h.sql was used to report on the results. The record produced in the run table, DBMSHP_RUNS, was:

         RUNID RUN_TIMESTAMP                   MICRO_S    SECONDS RUN_COMMENT
    ---------- ---------------------------- ---------- ---------- ------------------------------------------------------------
            11 04-MAR-13 07.07.36.803000        890719        .89 Profile for small test program with recursion

    The records produced in the functions table, DBMSHP_FUNCTION_INFO, were:

    OWNER MODULE               FUNCTION                         ID  LINE#      SUB_T      FUN_T  CALLS
    ----- -------------------- ------------------------------ ---- ------ ---------- ---------- ------
    NET   HPROF_TEST           A_CALLS_B                         4     40      62340       4450      1
    NET   HPROF_TEST           A_CALLS_B@1                       5     40      43729      13663      1
    NET   HPROF_TEST           B_CALLS_A                         6     38      57890      14161      1
    NET   HPROF_TEST           B_CALLS_A@1                       7     38      30066      30066      1
    NET   HPROF_TEST           DBFUNC                            8     84      32629      32629      2
    NET   HPROF_TEST           R_CALLS_R                         9     70      12823       4159      1
    NET   HPROF_TEST           R_CALLS_R@1                      10     70       8633       8618      1
    NET   HPROF_TEST           STOP_PROFILING                   11     16         21         21      1
    NET   TABLE_COUNT_TYPE     TABLE_COUNT_TYPE                 12      3      55049         82      1
    NET   TABLE_COUNT_TYPE     __static_sql_exec_line6          22      6      54967      54967      1
    SYS   DBMS_HPROF           STOP_PROFILING                   13     59          0          0      1
    SYS   DBMS_OUTPUT          GET_LINE                         14    129          8          8      3
    SYS   DBMS_OUTPUT          GET_LINES                        15    160         68         60      3
    SYS   DBMS_OUTPUT          NEW_LINE                         16    117          7          7      2
    SYS   DBMS_OUTPUT          PUT                              17     77         28         28      2
    SYS   DBMS_OUTPUT          PUT_LINE                         18    109         46         11      2
                               __anonymous_block                 1      0     809839        521      5
                               __dyn_sql_exec_line12            19     12        226        226      1
                               __plsql_vm                        2      0     828379         58      6
                               __plsql_vm@1                      3      0      14158         11      1
                               __sql_fetch_line13               20     13     726713     726713      1
                               __static_sql_exec_line8          21      8      14418        260      1
    
    22 rows selected.

    The SUB_T and FUN_T values are the total times in microseconds for the subtree including function, and function-only processing.

    The records produced in the functions parent-child table, DBMSHP_PARENT_CHILD_INFO, were:

    OWNER_P MODULE_P             FUNCTION_P                     OWNER_C MODULE_C             FUNCTION_C                          SUB_T      FUN_T  CALLS
    ------- -------------------- ------------------------------ ------- -------------------- ------------------------------ ---------- ---------- ------
    NET     HPROF_TEST           STOP_PROFILING                 SYS     DBMS_HPROF           STOP_PROFILING                          0          0      1
    NET     HPROF_TEST           R_CALLS_R@1                    SYS     DBMS_OUTPUT          PUT_LINE                               15          6      1
    NET     HPROF_TEST           R_CALLS_R                      SYS     DBMS_OUTPUT          PUT_LINE                               31          5      1
    NET     HPROF_TEST           R_CALLS_R                      NET     HPROF_TEST           R_CALLS_R@1                          8633       8618      1
    NET     HPROF_TEST           B_CALLS_A                      NET     HPROF_TEST           A_CALLS_B@1                         43729      13663      1
    NET     HPROF_TEST           A_CALLS_B@1                    NET     HPROF_TEST           B_CALLS_A@1                         30066      30066      1
    NET     HPROF_TEST           A_CALLS_B                      NET     HPROF_TEST           B_CALLS_A                           57890      14161      1
    NET     TABLE_COUNT_TYPE     TABLE_COUNT_TYPE               NET     TABLE_COUNT_TYPE     __static_sql_exec_line6             54967      54967      1
    SYS     DBMS_OUTPUT          PUT_LINE                       SYS     DBMS_OUTPUT          NEW_LINE                                7          7      2
    SYS     DBMS_OUTPUT          PUT_LINE                       SYS     DBMS_OUTPUT          PUT                                    28         28      2
    SYS     DBMS_OUTPUT          GET_LINES                      SYS     DBMS_OUTPUT          GET_LINE                                8          8      3
                                 __anonymous_block              NET     HPROF_TEST           STOP_PROFILING                         21         21      1
                                 __anonymous_block              SYS     DBMS_OUTPUT          GET_LINES                              68         60      3
                                 __anonymous_block                                           __dyn_sql_exec_line12                 226        226      1
                                 __anonymous_block              NET     HPROF_TEST           R_CALLS_R                           12823       4159      1
                                 __anonymous_block                                           __static_sql_exec_line8             14418        260      1
                                 __plsql_vm                     NET     HPROF_TEST           DBFUNC                              18482      18482      1
                                 __anonymous_block                                           __sql_fetch_line13                 726713     726713      1
                                 __static_sql_exec_line8                                     __plsql_vm@1                        14158         11      1
                                 __plsql_vm                                                  __anonymous_block                  809839        521      5
                                 __plsql_vm@1                   NET     HPROF_TEST           DBFUNC                              14147      14147      1
                                 __anonymous_block              NET     TABLE_COUNT_TYPE     TABLE_COUNT_TYPE                    55049         82      1
    
    22 rows selected.

    The SUB_T and FUN_T values are the total times in microseconds for the subtree including function, and function-only processing, respectively, for the child function while called from all instances of the parent.

    Function Call Tree
    The raw data above can be used to identify processing bottlenecks at a function level, but it’s also useful to process the data in order to display the function hierarchies, both for performance tuning and also for understanding the program structure. This is not quite as trivial as it may seem. The oracle-base article provides an SQL statement that attempts to do this:

    SELECT RPAD(' ', level*2, ' ') || fi.owner || '.' || fi.module AS name,
           fi.function,
           pci.subtree_elapsed_time,
           pci.function_elapsed_time,
           pci.calls
    FROM   dbmshp_parent_child_info pci
           JOIN dbmshp_function_info fi ON pci.runid = fi.runid AND pci.childsymid = fi.symbolid
    WHERE  pci.runid = :RUN_ID
    CONNECT BY PRIOR childsymid = parentsymid
    START WITH pci.parentsymid = :START_ID

    Here, bind variables replace the original hard-coded values. On running this query I often got the following result:

    ERROR at line 1:
    ORA-01436: CONNECT BY loop in user data

    On the run used in this article, the query returned 157 records, which is obviously incorrect. There is of course a NOCYCLE keyword that can be used to return results in the case of loops. However, it is not worth adding in this case, because there are in fact no loops in the data (at least no cyclic loops – apparent loops are discussed later). Oracle avoids loops by treating a function call that is a descendant of itself as a call to a new function, identified by suffices @1, @2 etc. as we can see from the recursive procedures above (eg R_CALLS_R@1 is the second call of R_CALLS_R, this one from itself). The problem here is that the query is incorrect in its handling of runid, with the result that the tree-walk traverses records from other runs as well as the intended one. A further problem is that there may be several roots, and it would be best to calculate these within a subquery. We can correct these problems by the following query:

    SELECT RPAD(' ', level*2, ' ') || fi.owner || '.' || fi.module AS name,
           fi.symbolid || ': ' || fi.function function,
           pci.subtree_elapsed_time sub_t,
           pci.function_elapsed_time fun_t,
           pci.calls
      FROM dbmshp_parent_child_info	pci
      JOIN dbmshp_function_info		fi 
        ON pci.runid	              = fi.runid 
       AND pci.childsymid	       = fi.symbolid
     WHERE pci.runid                   = :RUN_ID
    CONNECT BY PRIOR pci.childsymid    = pci.parentsymid 
           AND pci.runid	              = :RUN_ID
    START WITH pci.parentsymid         IN (SELECT f.symbolid FROM dbmshp_function_info f WHERE NOT EXISTS 
           (SELECT 1 FROM dbmshp_parent_child_info i WHERE i.childsymid = f.symbolid AND i.runid = :RUN_ID) AND f.runid = :RUN_ID)
           AND pci.runid	              = :RUN_ID

    This query returns the results:

    NAME                           FUNCTION                            SUB_T      FUN_T  CALLS
    ------------------------------ ------------------------------ ---------- ---------- ------
      .                            1: __anonymous_block              809,839        521      5
        NET.HPROF_TEST             9: R_CALLS_R                       12,823      4,159      1
          NET.HPROF_TEST           10: R_CALLS_R@1                     8,633      8,618      1
            SYS.DBMS_OUTPUT        18: PUT_LINE                           15          6      1
              SYS.DBMS_OUTPUT      16: NEW_LINE                            7          7      2
              SYS.DBMS_OUTPUT      17: PUT                                28         28      2
          SYS.DBMS_OUTPUT          18: PUT_LINE                           31          5      1
            SYS.DBMS_OUTPUT        16: NEW_LINE                            7          7      2
            SYS.DBMS_OUTPUT        17: PUT                                28         28      2
        NET.HPROF_TEST             11: STOP_PROFILING                     21         21      1
          SYS.DBMS_HPROF           13: STOP_PROFILING                      0          0      1
        NET.TABLE_COUNT_TYPE       12: TABLE_COUNT_TYPE               55,049         82      1
          NET.TABLE_COUNT_TYPE     22: __static_sql_exec_line6        54,967     54,967      1
        SYS.DBMS_OUTPUT            15: GET_LINES                          68         60      3
          SYS.DBMS_OUTPUT          14: GET_LINE                            8          8      3
        .                          19: __dyn_sql_exec_line12             226        226      1
        .                          20: __sql_fetch_line13            726,713    726,713      1
        .                          21: __static_sql_exec_line8        14,418        260      1
          .                        3: __plsql_vm@1                    14,158         11      1
            NET.HPROF_TEST         8: DBFUNC                          14,147     14,147      1
      NET.HPROF_TEST               8: DBFUNC                          18,482     18,482      1
      NET.HPROF_TEST               6: B_CALLS_A                       57,890     14,161      1
        NET.HPROF_TEST             5: A_CALLS_B@1                     43,729     13,663      1
          NET.HPROF_TEST           7: B_CALLS_A@1                     30,066     30,066      1
    
    24 rows selected.

    This is better, but we can identify some further issues.

    Missing Roots
    The true root results are missing: For example, A_CALLS_B is missing. This arises because the query is traversing the link records (DBMSHP_PARENT_CHILD_INFO), while the root information is stored in the nodes (DBMSHP_FUNCTION_INFO). This suggests a change from the CONNECT BY syntax to Oracle’s v11.2 recursive subquery factoring syntax, which allows you easily to start from the nodes, then traverse recursively via the links. (Incidentally, moving the start of profiling to its own block would result in A_CALLS_B appearing under __anonymous_block, but I prefer to retain the current structure in order to deal with the general case in which multiple roots are possible.)

    Duplicate Links
    Notice that function PUT_LINE is reported separately under R_CALLS_R and R_CALLS_R@1, and the timings differ. Also, its own child calls appear under each of its instances, but in those cases the timings are identical. The reason for this is that in the first case, there are separate records of the times used in each call, whereas in the second, the child calls have only a single record giving the total times across both instances of the parent call. The call from R_CALLS_R shows (9 – 4 = ) 5µs used in child calls, while the call from R_CALLS_R@1 shows 14µs. The child calls show totals of (3 + 16 = ) 19µs, equalling the sum across the parent calls.

    At this point it is worth looking at this from the more general perspective of a hierarchical data structure where parents can have multiple children and children multiple parents, with one or more roots. If a network diagram were constructed there would be loops apparent indicating multiple routes between nodes. In these situations, Oracle’s hierarchical queries effectively traverse all routes, and this is what causes the link duplication (in other scenarios this behaviour can cause big performance problems, but probably not here). Oracle’s cycle detection mechanism does not trigger because the loops do not result in any node being a descendant of itself (as noted above, extra nodes are generated by the profiler to avoid this).

    It seems to me better to avoid this duplication, and also to signal those cases where times are not aggregated up the tree. We can achieve this by the use of analytic functions. Note that, although the query below refers to the specific tables and attributes for this problem, the proposed solution could be used for any member of this general class of problem. The new query, which orders sibling records by descending subtree elapsed time, is:

    WITH last_run AS (
    SELECT Max (runid) runid FROM dbmshp_runs
    ), full_tree (runid, lev, node_id, sub_t, fun_t, calls, link_id) AS (
    SELECT fni.runid, 0, fni.symbolid, fni.subtree_elapsed_time, fni.function_elapsed_time, fni.calls, 'root' || ROWNUM
      FROM dbmshp_function_info fni
      JOIN last_run lrn
        ON lrn.runid = fni.runid
     WHERE NOT EXISTS (SELECT 1 FROM dbmshp_parent_child_info pci WHERE pci.childsymid = fni.symbolid AND pci.runid = fni.runid)
     UNION ALL
    SELECT ftr.runid, 
           ftr.lev + 1, 
           pci.childsymid, 
           pci.subtree_elapsed_time, 
           pci.function_elapsed_time, 
           pci.calls,
           pci.parentsymid || '-' || pci.childsymid
      FROM full_tree ftr
      JOIN dbmshp_parent_child_info pci
        ON pci.parentsymid = ftr.node_id
       AND pci.runid = ftr.runid
    ) SEARCH DEPTH FIRST BY sub_t DESC, fun_t DESC, calls DESC, node_id SET rn
    , tree_ranked AS (
    SELECT runid, node_id, lev, rn, 
           sub_t, fun_t, calls, 
           Row_Number () OVER (PARTITION BY node_id ORDER BY rn) node_rn,
           Count (*) OVER (PARTITION BY node_id) node_cnt,
           Row_Number () OVER (PARTITION BY link_id ORDER BY rn) link_rn
      FROM full_tree
    )
    SELECT RPad (' ', trr.lev*2, ' ') || fni.function "Function tree",
           fni.symbolid sy, fni.owner, fni.module,
           CASE WHEN trr.node_cnt > 1 THEN trr.node_rn || ' of ' || trr.node_cnt END "Inst.",
           trr.sub_t, trr.fun_t, trr.calls, 
           trr.rn "Row"
      FROM tree_ranked trr
      JOIN dbmshp_function_info fni
        ON fni.symbolid = trr.node_id
       AND fni.runid = trr.runid
     WHERE trr.link_rn = 1
     ORDER BY trr.rn

    Query Structure Diagram
    HProf - QSD

    The results are then:

    Function tree                        SY OWNER MODULE               Inst.         SUB_T      FUN_T  CALLS  Row
    ----------------------------------- --- ----- -------------------- -------- ---------- ---------- ------ ----
    __plsql_vm                            2                                        828,379         58      6    1
      __anonymous_block                   1                                        809,839        521      5    2
        __sql_fetch_line13               20                                        726,713    726,713      1    3
        TABLE_COUNT_TYPE                 12 NET   TABLE_COUNT_TYPE                  55,049         82      1    4
          __static_sql_exec_line6        22 NET   TABLE_COUNT_TYPE                  54,967     54,967      1    5
        __static_sql_exec_line8          21                                         14,418        260      1    6
          __plsql_vm@1                    3                                         14,158         11      1    7
            DBFUNC                        8 NET   HPROF_TEST           1 of 2       14,147     14,147      1    8
        R_CALLS_R                         9 NET   HPROF_TEST                        12,823      4,159      1    9
          R_CALLS_R@1                    10 NET   HPROF_TEST                         8,633      8,618      1   10
            PUT_LINE                     18 SYS   DBMS_OUTPUT          1 of 2           15          6      1   11
              PUT                        17 SYS   DBMS_OUTPUT          1 of 2           28         28      2   12
              NEW_LINE                   16 SYS   DBMS_OUTPUT          1 of 2            7          7      2   13
          PUT_LINE                       18 SYS   DBMS_OUTPUT          2 of 2           31          5      1   14
        __dyn_sql_exec_line12            19                                            226        226      1   17
        GET_LINES                        15 SYS   DBMS_OUTPUT                           68         60      3   18
          GET_LINE                       14 SYS   DBMS_OUTPUT                            8          8      3   19
        STOP_PROFILING                   11 NET   HPROF_TEST                            21         21      1   20
          STOP_PROFILING                 13 SYS   DBMS_HPROF                             0          0      1   21
      DBFUNC                              8 NET   HPROF_TEST           2 of 2       18,482     18,482      1   22
    A_CALLS_B                             4 NET   HPROF_TEST                        62,340      4,450      1   23
      B_CALLS_A                           6 NET   HPROF_TEST                        57,890     14,161      1   24
        A_CALLS_B@1                       5 NET   HPROF_TEST                        43,729     13,663      1   25
          B_CALLS_A@1                     7 NET   HPROF_TEST                        30,066     30,066      1   26
    
    24 rows selected.

    Notice that we now have a single record for each of the 22 links, plus the two root nodes. Also, the “Inst.” column lists the instance number of a function having more than one instance, and the children of any such function are only listed once with the gaps in the “Row” column indicating where duplicates have been suppressed.

    Network Diagrams
    It may be interesting to display the call tree in two diagrams, one for each root.
    Root __plsql_vm
    HProf - Net

    Root A_CALLS_B
    HProf - Net2

    Notes on Tree Output
    Anonymous Block (__anonymous_block)
    This function seems to correspond to invocations of anonymous blocks, obviously enough. However, there is an apparent anomaly in the number of calls listed, 6, because the driving program has only three such blocks, and there are none in the called PL/SQL code. I would surmise that the apparent discrepancy arises from the enabling of SERVEROUTPUT, which appears to result in a secondary block being associated with each explicit SQL*Plus block, that issues a call to GET_LINES to process buffered output.

    PL/SQL Engine (__plsql_vm)
    This function seems to correspond to external invocations of PL/SQL such as from a SQL*Plus session. There are 7 calls, 6 of them presumably being linked with the external anonymous blocks, and the seventh with DBFUNC, where a PL/SQL function is called from a SQL statement from SQL*Plus.

    Notice that the SQL statement calling a database function from within PL/SQL generates the recursive call to the engine, __plsql_vm@1

    Second Root (A_CALLS_B)
    The above function does not have the __plsql_vm/__anonymous_block ancestry that might be expected because profiling only started within the enclosing block.

    Inlined Procedure (Rest_a_While)
    I wrote a small procedure, Rest_a_While, to generate some elapsed time in the recursive procedures, but preceded it with the INLINE pragma, a new optimisation feature in 11g. This had the desired effect of removing the calls from the profiling output and including the times in the calling procedures. Rest_a_While does not make the obvious call to DBMS_Lock.Sleep because that procedure cannot be inlined. subprogram inlining in 11g provides some analysis of the inlining feature.

    Sibling Ordering
    We have ordered siblings by descending subtree elapsed time, using the SEARCH clause. It would be nice to have the option to order the siblings by initial invocation time, but Oracle does not provide the data to do this.

    Loops and Hierarchies
    The first diagram shows two loops, where there are two routes between the loop start and end points, indicated by different colours. The second loop has two child nodes coming from the end point, and hierarchical queries (both CONNECT BY and recursive subquery factors in Oracle) cause the links to be duplicated. Our query has filtered out the duplicates by analytic functions.

    It’s worth remembering this because it’s a general feature of SQL for querying hierarchies, and judging by Oracle forums, not one that’s widely understood. For larger hierarchies it can cause serious performance problems, and may justify a PL/SQL programmed solution that need not suffer the same problem.

    Manual Instrumentation
    Oracle’s hierarchical profiler clearly provides extremely useful information on both performance and structure of PL/SQL programs with very little effort. However, it does have the limitation of only providing information down to the subprogram level (which includes embedded SQL statements in this context). It is also often considered good practice to implement timing and other instrumentation permanently in production code, sometimes in a switchable fashion. In the test program, one of the called procedures, A_Calls_B, makes two calls to the inlined procedure, Rest_a_While, the second doing about twice as much work as the first. The profiler reports total within-function times of 4,450µs and 13,663µs on first and second calls, respectively (the work is scaled by a call number parameter, equal to 1, then 3).

    I created a second instance of the package and driver script (suffix _TS) to illustrate manual instrumentation. This uses an ‘object-oriented’ timing package that I wrote a couple of years ago Code Timing and Object Orientation and Zombies (November, 2010) to instrument at procedure and section level. I multiplied the work in Rest_a_While by a factor of ten to get larger times. This produced the output:

    Timer Set: HProf, Constructed at 05 Mar 2013 10:21:27, written at 10:21:30
    ==========================================================================
    [Timer timed: Elapsed (per call): 0.04 (0.000044), CPU (per call): 0.05 (0.000050), calls: 1000, '***' denotes corrected line below]
    
    Timer                       Elapsed          CPU          Calls        Ela/Call        CPU/Call
    ----------------------   ----------   ----------   ------------   -------------   -------------
    A_Calls_B, section one         0.06         0.05              2         0.03150         0.02500
    A_Calls_B, section two         0.12         0.12              2         0.06050         0.06000
    B_Calls_A: 2                   0.15         0.16              1         0.15400         0.16000
    B_Calls_A: 4                   0.31         0.30              1         0.30700         0.30000
    DBFunc                         0.32         0.31              2         0.15950         0.15500
    Open cursor                    0.69         0.69              1         0.68900         0.69000
    Fetch from cursor              0.70         0.70              1         0.69600         0.70000
    Close cursor                   0.00         0.00              1         0.00000         0.00000
    Construct object               0.06         0.04              1         0.05500         0.04000
    R_Calls_R                      0.14         0.14              2         0.07000         0.07000
    (Other)                        0.00         0.00              1         0.00000         0.00000
    ----------------------   ----------   ----------   ------------   -------------   -------------
    Total                          2.54         2.51             15         0.16960         0.16733
    ----------------------   ----------   ----------   ------------   -------------   -------------
    

    Notes on Code Timing

    • Calls, CPU and elapsed times have been captured at the section level for A_Calls_B
    • Observe that, while R_Calls_R and A_Calls_B aggregate over all calls, B_Calls_A records values by call; this is implemented simply by including a value that changes with call in the timer name
    • The timing set object is designed to be very low footprint; here 9 statements (calls to Increment_Time), plus a small global overhead, produced 10 result lines, plus associated information
    • The ‘object-oriented’ approach allows multiple programs to be be timed at multiple levels, without interference between timings
    • There are Perl and Java implementations of this timing set object included in the Scribd article mentioned

    Oracle’s Flat Profiler (DBMS_Profiler)
    The hierarchical profiler was introduced in v11.1, while prior to this there was a non-hierarchical profiler, DBMS_Profiler. This package still exists in v11: It is omitted from the advanced application developer’s guide for v11, but is described in the packages and types manual (Oracle® Database PL/SQL Packages and Types Reference, 11g Release 2 (11.2)); also, SQL*Developer appears to support only the newer hierarchical verion (via right-click on a package). I thought it interesting to run the older version on the same test program (package Old_Test_Prof, driver script Test_Rep_p_Old.sql and reporting script Test_Rep_h_Old.sql). The output from the first three queries is:

    Run header (PLSQL_PROFILER_RUNS)
    
         RUNID RUN_DATE        MICRO_S    SECONDS
    ---------- ------------ ---------- ----------
             3 11:03:13        2164000       2.16
    
    Profiler data summary (PLSQL_PROFILER_DATA)
    
       MICRO_S SECONDS    CALLS
    ---------- ------- --------
       2126949    2.13       72
    
    Profiler data by time (PLSQL_PROFILER_DATA)
    
       MICRO_S SECONDS    CALLS UNIT_NAME            UNIT_NUMBER  LINE#
    ---------- ------- -------- -------------------- ----------- ------
        729932    0.73        1                                5     13
        569563    0.57        2 OLD_PROF_TEST                  1     56
        377880    0.38        2 OLD_PROF_TEST                  1     82
        166019    0.17        2 OLD_PROF_TEST                  1     70
        150117    0.15        2 OLD_PROF_TEST                  1     43
         72742    0.07        2 OLD_PROF_TEST                  1     40
         56473    0.06        1 TABLE_COUNT_TYPE               6      6
          3338    0.00        1                                5      8
           258    0.00        1                                5     12
           109    0.00        1                                5     16
            68    0.00        2 OLD_PROF_TEST                  1     67
            66    0.00        2                                4      1
            60    0.00        2                                7      1
            60    0.00        2                                3      1
            44    0.00        1                                5     14
            42    0.00        0                                2      5
            31    0.00        1 OLD_PROF_TEST                  1     18
            26    0.00        1                                8      5
            13    0.00        0                                5      1
             9    0.00        1 TABLE_COUNT_TYPE               6     11
             9    0.00        2 OLD_PROF_TEST                  1     86
             8    0.00        0 OLD_PROF_TEST                  1     51
             8    0.00        1 TABLE_COUNT_TYPE               6     13
             7    0.00        1                                5     18
             6    0.00        0 OLD_PROF_TEST                  1     78
             6    0.00        0 OLD_PROF_TEST                  1     64
             6    0.00        0                                8      1
             6    0.00        1 TABLE_COUNT_TYPE               6      3
             5    0.00        0 OLD_PROF_TEST                  1     35
             5    0.00        0 OLD_PROF_TEST                  1     15
             4    0.00        1                                8      7
             4    0.00        1 OLD_PROF_TEST                  1     76
             3    0.00        1                                2      8
             2    0.00        1 OLD_PROF_TEST                  1     62
             2    0.00        1 OLD_PROF_TEST                  1     13
             2    0.00        1 TABLE_COUNT_TYPE               6      5
             2    0.00        2 OLD_PROF_TEST                  1     72
             2    0.00        2 OLD_PROF_TEST                  1     45
             2    0.00        2 OLD_PROF_TEST                  1     49
             2    0.00        2 OLD_PROF_TEST                  1     46
             2    0.00        2 OLD_PROF_TEST                  1     58
             1    0.00        1 OLD_PROF_TEST                  1     73
             1    0.00        1                                2      6
             1    0.00        1 OLD_PROF_TEST                  1     59
             1    0.00        1 OLD_PROF_TEST                  1     11
             1    0.00        2 OLD_PROF_TEST                  1     54
             1    0.00        2 OLD_PROF_TEST                  1     84
             0    0.00        0 OLD_PROF_TEST                  1      1
             0    0.00        0 OLD_PROF_TEST                  1     88
             0    0.00        0                                8      9
             0    0.00        0                                2      1
             0    0.00        0                                2      2
             0    0.00        0 OLD_PROF_TEST                  1      3
             0    0.00        0 OLD_PROF_TEST                  1      5
             0    0.00        0 OLD_PROF_TEST                  1      9
             0    0.00        0 OLD_PROF_TEST                  1     20
             0    0.00        1 TABLE_COUNT_TYPE               6      4
             0    0.00        1                                8      2
             0    0.00        2 OLD_PROF_TEST                  1     39
             0    0.00        2 OLD_PROF_TEST                  1     55
             0    0.00        2 OLD_PROF_TEST                  1     69
             0    0.00        2 OLD_PROF_TEST                  1     38
             0    0.00        2 OLD_PROF_TEST                  1     81
             0    0.00        2 OLD_PROF_TEST                  1     42
             0    0.00        2 OLD_PROF_TEST                  1     68
    
    65 rows selected.
    
    

    Referring to the package, type and anonymous blocks, I assigned labels to all the lines having more than 10µs, as follows:

       MICRO_S SECONDS    CALLS UNIT_NAME            UNIT_NUMBER  LINE#
    ---------- ------- -------- -------------------- ----------- ------
        729932    0.73        1                                5     13  B2: FETCH
        569563    0.57        2 OLD_PROF_TEST                  1     56  B_Calls_A (Rest_a_While)
        377880    0.38        2 OLD_PROF_TEST                  1     82  DBFunc (Rest_a_While)
        166019    0.17        2 OLD_PROF_TEST                  1     70  R_Calls_R (Rest_a_While)
        150117    0.15        2 OLD_PROF_TEST                  1     43  A_Calls_B (Rest_a_While, section 2)
         72742    0.07        2 OLD_PROF_TEST                  1     40  A_Calls_B (Rest_a_While, section 1)
         56473    0.06        1 TABLE_COUNT_TYPE               6      6  SELECT
          3338    0.00        1                                5      8  B2: SELECT DBFunc
           258    0.00        1                                5     12  B2: OPEN
           109    0.00        1                                5     16  B2: Assign Table_Count_Type
            68    0.00        2 OLD_PROF_TEST                  1     67  Put_Line
            66    0.00        2                                4      1  Auxiliary SERVEROUTPUT block for B2 (surmised)
            60    0.00        2                                7      1  Auxiliary SERVEROUTPUT block for B3 (surmised)
            60    0.00        2                                3      1  Auxiliary SERVEROUTPUT block for B1 (surmised)
            44    0.00        1                                5     14  B2: CLOSE
            42    0.00        0                                2      5  B1: Call to Start_Profiling 
            31    0.00        1 OLD_PROF_TEST                  1     18  RETURN DBMS_Profiler.Stop_Profiler;
            26    0.00        1                                8      5  B3: Call R_Calls_R
            13    0.00        0                                5      1  B2: DECLARE
    

    Notes on Output of Flat Profiler
    There were six units with no linked information in DBMS_PROFILER_UNITS. By examining the data, I was able to associate unit numbers 2, 5 and 8 with my anonymous blocks B1, B2 and B3. That left three unassigned, and I have surmised that these correspond to the auxiliary blocks associated with processing server output that we earlier surmised when examining the output from the hierarchical profiler.

    • The useful call tree structure is not present in the data from the old profiler
    • However, the results are at a line level, which the hierarchical profiler does not provide; for example, the two sections of A_Calls_B are reported separately
    • Deciphering the output requires significantly more manual effort than with the hierarchical profiler
    • Both old and new profiler have their own advantages, and so both should be considered of value
    • Manual code timing offers more flexibility in terms of aggregating lines and call instances, but requires more effort

    Conclusions

    • Running Oracle’s hierarchical profiler would seem to be the default first step in tuning PL/SQL programs from v11.1
    • Some care is needed in interpreting the output data; I’ve provided a query for displaying the hierarchies
    • Performance is recorded only down to function level, so it will still often be worthwhile to use the old flat profiler in addition
    • Manually timing code sections also still has a part to play, in terms of instrumentation and greater flexibility where necessary






  • Database function called from SQL in SQL*Plus (DBFUNC)
  • Database function called from SQL in PL/SQL (DBFUNC)
  • Object constructor call (TABLE_COUNT_TYPE)

Call Structure Diagram
HProf - CSD

Raw Results
The attached script Test_Rep_h.sql was used to report on the results. The record produced in the run table, DBMSHP_RUNS, was:

     RUNID RUN_TIMESTAMP                   MICRO_S    SECONDS RUN_COMMENT
---------- ---------------------------- ---------- ---------- ------------------------------------------------------------
        11 04-MAR-13 07.07.36.803000        890719        .89 Profile for small test program with recursion

The records produced in the functions table, DBMSHP_FUNCTION_INFO, were:

OWNER MODULE               FUNCTION                         ID  LINE#      SUB_T      FUN_T  CALLS
----- -------------------- ------------------------------ ---- ------ ---------- ---------- ------
NET   HPROF_TEST           A_CALLS_B                         4     40      62340       4450      1
NET   HPROF_TEST           A_CALLS_B@1                       5     40      43729      13663      1
NET   HPROF_TEST           B_CALLS_A                         6     38      57890      14161      1
NET   HPROF_TEST           B_CALLS_A@1                       7     38      30066      30066      1
NET   HPROF_TEST           DBFUNC                            8     84      32629      32629      2
NET   HPROF_TEST           R_CALLS_R                         9     70      12823       4159      1
NET   HPROF_TEST           R_CALLS_R@1                      10     70       8633       8618      1
NET   HPROF_TEST           STOP_PROFILING                   11     16         21         21      1
NET   TABLE_COUNT_TYPE     TABLE_COUNT_TYPE                 12      3      55049         82      1
NET   TABLE_COUNT_TYPE     __static_sql_exec_line6          22      6      54967      54967      1
SYS   DBMS_HPROF           STOP_PROFILING                   13     59          0          0      1
SYS   DBMS_OUTPUT          GET_LINE                         14    129          8          8      3
SYS   DBMS_OUTPUT          GET_LINES                        15    160         68         60      3
SYS   DBMS_OUTPUT          NEW_LINE                         16    117          7          7      2
SYS   DBMS_OUTPUT          PUT                              17     77         28         28      2
SYS   DBMS_OUTPUT          PUT_LINE                         18    109         46         11      2
                           __anonymous_block                 1      0     809839        521      5
                           __dyn_sql_exec_line12            19     12        226        226      1
                           __plsql_vm                        2      0     828379         58      6
                           __plsql_vm@1                      3      0      14158         11      1
                           __sql_fetch_line13               20     13     726713     726713      1
                           __static_sql_exec_line8          21      8      14418        260      1

22 rows selected.

The SUB_T and FUN_T values are the total times in microseconds for the subtree including function, and function-only processing.

The records produced in the functions parent-child table, DBMSHP_PARENT_CHILD_INFO, were:

OWNER_P MODULE_P             FUNCTION_P                     OWNER_C MODULE_C             FUNCTION_C                          SUB_T      FUN_T  CALLS
------- -------------------- ------------------------------ ------- -------------------- ------------------------------ ---------- ---------- ------
NET     HPROF_TEST           STOP_PROFILING                 SYS     DBMS_HPROF           STOP_PROFILING                          0          0      1
NET     HPROF_TEST           R_CALLS_R@1                    SYS     DBMS_OUTPUT          PUT_LINE                               15          6      1
NET     HPROF_TEST           R_CALLS_R                      SYS     DBMS_OUTPUT          PUT_LINE                               31          5      1
NET     HPROF_TEST           R_CALLS_R                      NET     HPROF_TEST           R_CALLS_R@1                          8633       8618      1
NET     HPROF_TEST           B_CALLS_A                      NET     HPROF_TEST           A_CALLS_B@1                         43729      13663      1
NET     HPROF_TEST           A_CALLS_B@1                    NET     HPROF_TEST           B_CALLS_A@1                         30066      30066      1
NET     HPROF_TEST           A_CALLS_B                      NET     HPROF_TEST           B_CALLS_A                           57890      14161      1
NET     TABLE_COUNT_TYPE     TABLE_COUNT_TYPE               NET     TABLE_COUNT_TYPE     __static_sql_exec_line6             54967      54967      1
SYS     DBMS_OUTPUT          PUT_LINE                       SYS     DBMS_OUTPUT          NEW_LINE                                7          7      2
SYS     DBMS_OUTPUT          PUT_LINE                       SYS     DBMS_OUTPUT          PUT                                    28         28      2
SYS     DBMS_OUTPUT          GET_LINES                      SYS     DBMS_OUTPUT          GET_LINE                                8          8      3
                             __anonymous_block              NET     HPROF_TEST           STOP_PROFILING                         21         21      1
                             __anonymous_block              SYS     DBMS_OUTPUT          GET_LINES                              68         60      3
                             __anonymous_block                                           __dyn_sql_exec_line12                 226        226      1
                             __anonymous_block              NET     HPROF_TEST           R_CALLS_R                           12823       4159      1
                             __anonymous_block                                           __static_sql_exec_line8             14418        260      1
                             __plsql_vm                     NET     HPROF_TEST           DBFUNC                              18482      18482      1
                             __anonymous_block                                           __sql_fetch_line13                 726713     726713      1
                             __static_sql_exec_line8                                     __plsql_vm@1                        14158         11      1
                             __plsql_vm                                                  __anonymous_block                  809839        521      5
                             __plsql_vm@1                   NET     HPROF_TEST           DBFUNC                              14147      14147      1
                             __anonymous_block              NET     TABLE_COUNT_TYPE     TABLE_COUNT_TYPE                    55049         82      1

22 rows selected.

The SUB_T and FUN_T values are the total times in microseconds for the subtree including function, and function-only processing, respectively, for the child function while called from all instances of the parent.

Function Call Tree
The raw data above can be used to identify processing bottlenecks at a function level, but it’s also useful to process the data in order to display the function hierarchies, both for performance tuning and also for understanding the program structure. This is not quite as trivial as it may seem. The oracle-base article provides an SQL statement that attempts to do this:

SELECT RPAD(' ', level*2, ' ') || fi.owner || '.' || fi.module AS name,
       fi.function,
       pci.subtree_elapsed_time,
       pci.function_elapsed_time,
       pci.calls
FROM   dbmshp_parent_child_info pci
       JOIN dbmshp_function_info fi ON pci.runid = fi.runid AND pci.childsymid = fi.symbolid
WHERE  pci.runid = :RUN_ID
CONNECT BY PRIOR childsymid = parentsymid
START WITH pci.parentsymid = :START_ID

Here, bind variables replace the original hard-coded values. On running this query I often got the following result:

ERROR at line 1:
ORA-01436: CONNECT BY loop in user data

On the run used in this article, the query returned 157 records, which is obviously incorrect. There is of course a NOCYCLE keyword that can be used to return results in the case of loops. However, it is not worth adding in this case, because there are in fact no loops in the data (at least no cyclic loops – apparent loops are discussed later). Oracle avoids loops by treating a function call that is a descendant of itself as a call to a new function, identified by suffices @1, @2 etc. as we can see from the recursive procedures above (eg R_CALLS_R@1 is the second call of R_CALLS_R, this one from itself). The problem here is that the query is incorrect in its handling of runid, with the result that the tree-walk traverses records from other runs as well as the intended one. A further problem is that there may be several roots, and it would be best to calculate these within a subquery. We can correct these problems by the following query:

SELECT RPAD(' ', level*2, ' ') || fi.owner || '.' || fi.module AS name,
       fi.symbolid || ': ' || fi.function function,
       pci.subtree_elapsed_time sub_t,
       pci.function_elapsed_time fun_t,
       pci.calls
  FROM dbmshp_parent_child_info	pci
  JOIN dbmshp_function_info		fi 
    ON pci.runid	              = fi.runid 
   AND pci.childsymid	       = fi.symbolid
 WHERE pci.runid                   = :RUN_ID
CONNECT BY PRIOR pci.childsymid    = pci.parentsymid 
       AND pci.runid	              = :RUN_ID
START WITH pci.parentsymid         IN (SELECT f.symbolid FROM dbmshp_function_info f WHERE NOT EXISTS 
       (SELECT 1 FROM dbmshp_parent_child_info i WHERE i.childsymid = f.symbolid AND i.runid = :RUN_ID) AND f.runid = :RUN_ID)
       AND pci.runid	              = :RUN_ID

This query returns the results:

NAME                           FUNCTION                            SUB_T      FUN_T  CALLS
------------------------------ ------------------------------ ---------- ---------- ------
  .                            1: __anonymous_block              809,839        521      5
    NET.HPROF_TEST             9: R_CALLS_R                       12,823      4,159      1
      NET.HPROF_TEST           10: R_CALLS_R@1                     8,633      8,618      1
        SYS.DBMS_OUTPUT        18: PUT_LINE                           15          6      1
          SYS.DBMS_OUTPUT      16: NEW_LINE                            7          7      2
          SYS.DBMS_OUTPUT      17: PUT                                28         28      2
      SYS.DBMS_OUTPUT          18: PUT_LINE                           31          5      1
        SYS.DBMS_OUTPUT        16: NEW_LINE                            7          7      2
        SYS.DBMS_OUTPUT        17: PUT                                28         28      2
    NET.HPROF_TEST             11: STOP_PROFILING                     21         21      1
      SYS.DBMS_HPROF           13: STOP_PROFILING                      0          0      1
    NET.TABLE_COUNT_TYPE       12: TABLE_COUNT_TYPE               55,049         82      1
      NET.TABLE_COUNT_TYPE     22: __static_sql_exec_line6        54,967     54,967      1
    SYS.DBMS_OUTPUT            15: GET_LINES                          68         60      3
      SYS.DBMS_OUTPUT          14: GET_LINE                            8          8      3
    .                          19: __dyn_sql_exec_line12             226        226      1
    .                          20: __sql_fetch_line13            726,713    726,713      1
    .                          21: __static_sql_exec_line8        14,418        260      1
      .                        3: __plsql_vm@1                    14,158         11      1
        NET.HPROF_TEST         8: DBFUNC                          14,147     14,147      1
  NET.HPROF_TEST               8: DBFUNC                          18,482     18,482      1
  NET.HPROF_TEST               6: B_CALLS_A                       57,890     14,161      1
    NET.HPROF_TEST             5: A_CALLS_B@1                     43,729     13,663      1
      NET.HPROF_TEST           7: B_CALLS_A@1                     30,066     30,066      1

24 rows selected.

This is better, but we can identify some further issues.

Missing Roots
The true root results are missing: For example, A_CALLS_B is missing. This arises because the query is traversing the link records (DBMSHP_PARENT_CHILD_INFO), while the root information is stored in the nodes (DBMSHP_FUNCTION_INFO). This suggests a change from the CONNECT BY syntax to Oracle’s v11.2 recursive subquery factoring syntax, which allows you easily to start from the nodes, then traverse recursively via the links. (Incidentally, moving the start of profiling to its own block would result in A_CALLS_B appearing under __anonymous_block, but I prefer to retain the current structure in order to deal with the general case in which multiple roots are possible.)

Duplicate Links
Notice that function PUT_LINE is reported separately under R_CALLS_R and R_CALLS_R@1, and the timings differ. Also, its own child calls appear under each of its instances, but in those cases the timings are identical. The reason for this is that in the first case, there are separate records of the times used in each call, whereas in the second, the child calls have only a single record giving the total times across both instances of the parent call. The call from R_CALLS_R shows (9 – 4 = ) 5µs used in child calls, while the call from R_CALLS_R@1 shows 14µs. The child calls show totals of (3 + 16 = ) 19µs, equalling the sum across the parent calls.

At this point it is worth looking at this from the more general perspective of a hierarchical data structure where parents can have multiple children and children multiple parents, with one or more roots. If a network diagram were constructed there would be loops apparent indicating multiple routes between nodes. In these situations, Oracle’s hierarchical queries effectively traverse all routes, and this is what causes the link duplication (in other scenarios this behaviour can cause big performance problems, but probably not here). Oracle’s cycle detection mechanism does not trigger because the loops do not result in any node being a descendant of itself (as noted above, extra nodes are generated by the profiler to avoid this).

It seems to me better to avoid this duplication, and also to signal those cases where times are not aggregated up the tree. We can achieve this by the use of analytic functions. Note that, although the query below refers to the specific tables and attributes for this problem, the proposed solution could be used for any member of this general class of problem. The new query, which orders sibling records by descending subtree elapsed time, is:

WITH last_run AS (
SELECT Max (runid) runid FROM dbmshp_runs
), full_tree (runid, lev, node_id, sub_t, fun_t, calls, link_id) AS (
SELECT fni.runid, 0, fni.symbolid, fni.subtree_elapsed_time, fni.function_elapsed_time, fni.calls, 'root' || ROWNUM
  FROM dbmshp_function_info fni
  JOIN last_run lrn
    ON lrn.runid = fni.runid
 WHERE NOT EXISTS (SELECT 1 FROM dbmshp_parent_child_info pci WHERE pci.childsymid = fni.symbolid AND pci.runid = fni.runid)
 UNION ALL
SELECT ftr.runid, 
       ftr.lev + 1, 
       pci.childsymid, 
       pci.subtree_elapsed_time, 
       pci.function_elapsed_time, 
       pci.calls,
       pci.parentsymid || '-' || pci.childsymid
  FROM full_tree ftr
  JOIN dbmshp_parent_child_info pci
    ON pci.parentsymid = ftr.node_id
   AND pci.runid = ftr.runid
) SEARCH DEPTH FIRST BY sub_t DESC, fun_t DESC, calls DESC, node_id SET rn
, tree_ranked AS (
SELECT runid, node_id, lev, rn, 
       sub_t, fun_t, calls, 
       Row_Number () OVER (PARTITION BY node_id ORDER BY rn) node_rn,
       Count (*) OVER (PARTITION BY node_id) node_cnt,
       Row_Number () OVER (PARTITION BY link_id ORDER BY rn) link_rn
  FROM full_tree
)
SELECT RPad (' ', trr.lev*2, ' ') || fni.function "Function tree",
       fni.symbolid sy, fni.owner, fni.module,
       CASE WHEN trr.node_cnt > 1 THEN trr.node_rn || ' of ' || trr.node_cnt END "Inst.",
       trr.sub_t, trr.fun_t, trr.calls, 
       trr.rn "Row"
  FROM tree_ranked trr
  JOIN dbmshp_function_info fni
    ON fni.symbolid = trr.node_id
   AND fni.runid = trr.runid
 WHERE trr.link_rn = 1
 ORDER BY trr.rn

Query Structure Diagram
HProf - QSD

The results are then:

Function tree                        SY OWNER MODULE               Inst.         SUB_T      FUN_T  CALLS  Row
----------------------------------- --- ----- -------------------- -------- ---------- ---------- ------ ----
__plsql_vm                            2                                        828,379         58      6    1
  __anonymous_block                   1                                        809,839        521      5    2
    __sql_fetch_line13               20                                        726,713    726,713      1    3
    TABLE_COUNT_TYPE                 12 NET   TABLE_COUNT_TYPE                  55,049         82      1    4
      __static_sql_exec_line6        22 NET   TABLE_COUNT_TYPE                  54,967     54,967      1    5
    __static_sql_exec_line8          21                                         14,418        260      1    6
      __plsql_vm@1                    3                                         14,158         11      1    7
        DBFUNC                        8 NET   HPROF_TEST           1 of 2       14,147     14,147      1    8
    R_CALLS_R                         9 NET   HPROF_TEST                        12,823      4,159      1    9
      R_CALLS_R@1                    10 NET   HPROF_TEST                         8,633      8,618      1   10
        PUT_LINE                     18 SYS   DBMS_OUTPUT          1 of 2           15          6      1   11
          PUT                        17 SYS   DBMS_OUTPUT          1 of 2           28         28      2   12
          NEW_LINE                   16 SYS   DBMS_OUTPUT          1 of 2            7          7      2   13
      PUT_LINE                       18 SYS   DBMS_OUTPUT          2 of 2           31          5      1   14
    __dyn_sql_exec_line12            19                                            226        226      1   17
    GET_LINES                        15 SYS   DBMS_OUTPUT                           68         60      3   18
      GET_LINE                       14 SYS   DBMS_OUTPUT                            8          8      3   19
    STOP_PROFILING                   11 NET   HPROF_TEST                            21         21      1   20
      STOP_PROFILING                 13 SYS   DBMS_HPROF                             0          0      1   21
  DBFUNC                              8 NET   HPROF_TEST           2 of 2       18,482     18,482      1   22
A_CALLS_B                             4 NET   HPROF_TEST                        62,340      4,450      1   23
  B_CALLS_A                           6 NET   HPROF_TEST                        57,890     14,161      1   24
    A_CALLS_B@1                       5 NET   HPROF_TEST                        43,729     13,663      1   25
      B_CALLS_A@1                     7 NET   HPROF_TEST                        30,066     30,066      1   26

24 rows selected.

Notice that we now have a single record for each of the 22 links, plus the two root nodes. Also, the “Inst.” column lists the instance number of a function having more than one instance, and the children of any such function are only listed once with the gaps in the “Row” column indicating where duplicates have been suppressed.

Network Diagrams
It may be interesting to display the call tree in two diagrams, one for each root.
Root __plsql_vm
HProf - Net

Root A_CALLS_B
HProf - Net2

Notes on Tree Output
Anonymous Block (__anonymous_block)
This function seems to correspond to invocations of anonymous blocks, obviously enough. However, there is an apparent anomaly in the number of calls listed, 6, because the driving program has only three such blocks, and there are none in the called PL/SQL code. I would surmise that the apparent discrepancy arises from the enabling of SERVEROUTPUT, which appears to result in a secondary block being associated with each explicit SQL*Plus block, that issues a call to GET_LINES to process buffered output.

PL/SQL Engine (__plsql_vm)
This function seems to correspond to external invocations of PL/SQL such as from a SQL*Plus session. There are 7 calls, 6 of them presumably being linked with the external anonymous blocks, and the seventh with DBFUNC, where a PL/SQL function is called from a SQL statement from SQL*Plus.

Notice that the SQL statement calling a database function from within PL/SQL generates the recursive call to the engine, __plsql_vm@1

Second Root (A_CALLS_B)
The above function does not have the __plsql_vm/__anonymous_block ancestry that might be expected because profiling only started within the enclosing block.

Inlined Procedure (Rest_a_While)
I wrote a small procedure, Rest_a_While, to generate some elapsed time in the recursive procedures, but preceded it with the INLINE pragma, a new optimisation feature in 11g. This had the desired effect of removing the calls from the profiling output and including the times in the calling procedures. Rest_a_While does not make the obvious call to DBMS_Lock.Sleep because that procedure cannot be inlined. subprogram inlining in 11g provides some analysis of the inlining feature.

Sibling Ordering
We have ordered siblings by descending subtree elapsed time, using the SEARCH clause. It would be nice to have the option to order the siblings by initial invocation time, but Oracle does not provide the data to do this.

Loops and Hierarchies
The first diagram shows two loops, where there are two routes between the loop start and end points, indicated by different colours. The second loop has two child nodes coming from the end point, and hierarchical queries (both CONNECT BY and recursive subquery factors in Oracle) cause the links to be duplicated. Our query has filtered out the duplicates by analytic functions.

It’s worth remembering this because it’s a general feature of SQL for querying hierarchies, and judging by Oracle forums, not one that’s widely understood. For larger hierarchies it can cause serious performance problems, and may justify a PL/SQL programmed solution that need not suffer the same problem.

Manual Instrumentation
Oracle’s hierarchical profiler clearly provides extremely useful information on both performance and structure of PL/SQL programs with very little effort. However, it does have the limitation of only providing information down to the subprogram level (which includes embedded SQL statements in this context). It is also often considered good practice to implement timing and other instrumentation permanently in production code, sometimes in a switchable fashion. In the test program, one of the called procedures, A_Calls_B, makes two calls to the inlined procedure, Rest_a_While, the second doing about twice as much work as the first. The profiler reports total within-function times of 4,450µs and 13,663µs on first and second calls, respectively (the work is scaled by a call number parameter, equal to 1, then 3).

I created a second instance of the package and driver script (suffix _TS) to illustrate manual instrumentation. This uses an ‘object-oriented’ timing package that I wrote a couple of years ago Code Timing and Object Orientation and Zombies (November, 2010) to instrument at procedure and section level. I multiplied the work in Rest_a_While by a factor of ten to get larger times. This produced the output:

Timer Set: HProf, Constructed at 05 Mar 2013 10:21:27, written at 10:21:30
==========================================================================
[Timer timed: Elapsed (per call): 0.04 (0.000044), CPU (per call): 0.05 (0.000050), calls: 1000, '***' denotes corrected line below]

Timer                       Elapsed          CPU          Calls        Ela/Call        CPU/Call
----------------------   ----------   ----------   ------------   -------------   -------------
A_Calls_B, section one         0.06         0.05              2         0.03150         0.02500
A_Calls_B, section two         0.12         0.12              2         0.06050         0.06000
B_Calls_A: 2                   0.15         0.16              1         0.15400         0.16000
B_Calls_A: 4                   0.31         0.30              1         0.30700         0.30000
DBFunc                         0.32         0.31              2         0.15950         0.15500
Open cursor                    0.69         0.69              1         0.68900         0.69000
Fetch from cursor              0.70         0.70              1         0.69600         0.70000
Close cursor                   0.00         0.00              1         0.00000         0.00000
Construct object               0.06         0.04              1         0.05500         0.04000
R_Calls_R                      0.14         0.14              2         0.07000         0.07000
(Other)                        0.00         0.00              1         0.00000         0.00000
----------------------   ----------   ----------   ------------   -------------   -------------
Total                          2.54         2.51             15         0.16960         0.16733
----------------------   ----------   ----------   ------------   -------------   -------------

Notes on Code Timing

  • Calls, CPU and elapsed times have been captured at the section level for A_Calls_B
  • Observe that, while R_Calls_R and A_Calls_B aggregate over all calls, B_Calls_A records values by call; this is implemented simply by including a value that changes with call in the timer name
  • The timing set object is designed to be very low footprint; here 9 statements (calls to Increment_Time), plus a small global overhead, produced 10 result lines, plus associated information
  • The ‘object-oriented’ approach allows multiple programs to be be timed at multiple levels, without interference between timings
  • There are Perl and Java implementations of this timing set object included in the Scribd article mentioned

Oracle’s Flat Profiler (DBMS_Profiler)
The hierarchical profiler was introduced in v11.1, while prior to this there was a non-hierarchical profiler, DBMS_Profiler. This package still exists in v11: It is omitted from the advanced application developer’s guide for v11, but is described in the packages and types manual (Oracle® Database PL/SQL Packages and Types Reference, 11g Release 2 (11.2)); also, SQL*Developer appears to support only the newer hierarchical verion (via right-click on a package). I thought it interesting to run the older version on the same test program (package Old_Test_Prof, driver script Test_Rep_p_Old.sql and reporting script Test_Rep_h_Old.sql). The output from the first three queries is:

Run header (PLSQL_PROFILER_RUNS)

     RUNID RUN_DATE        MICRO_S    SECONDS
---------- ------------ ---------- ----------
         3 11:03:13        2164000       2.16

Profiler data summary (PLSQL_PROFILER_DATA)

   MICRO_S SECONDS    CALLS
---------- ------- --------
   2126949    2.13       72

Profiler data by time (PLSQL_PROFILER_DATA)

   MICRO_S SECONDS    CALLS UNIT_NAME            UNIT_NUMBER  LINE#
---------- ------- -------- -------------------- ----------- ------
    729932    0.73        1                                5     13
    569563    0.57        2 OLD_PROF_TEST                  1     56
    377880    0.38        2 OLD_PROF_TEST                  1     82
    166019    0.17        2 OLD_PROF_TEST                  1     70
    150117    0.15        2 OLD_PROF_TEST                  1     43
     72742    0.07        2 OLD_PROF_TEST                  1     40
     56473    0.06        1 TABLE_COUNT_TYPE               6      6
      3338    0.00        1                                5      8
       258    0.00        1                                5     12
       109    0.00        1                                5     16
        68    0.00        2 OLD_PROF_TEST                  1     67
        66    0.00        2                                4      1
        60    0.00        2                                7      1
        60    0.00        2                                3      1
        44    0.00        1                                5     14
        42    0.00        0                                2      5
        31    0.00        1 OLD_PROF_TEST                  1     18
        26    0.00        1                                8      5
        13    0.00        0                                5      1
         9    0.00        1 TABLE_COUNT_TYPE               6     11
         9    0.00        2 OLD_PROF_TEST                  1     86
         8    0.00        0 OLD_PROF_TEST                  1     51
         8    0.00        1 TABLE_COUNT_TYPE               6     13
         7    0.00        1                                5     18
         6    0.00        0 OLD_PROF_TEST                  1     78
         6    0.00        0 OLD_PROF_TEST                  1     64
         6    0.00        0                                8      1
         6    0.00        1 TABLE_COUNT_TYPE               6      3
         5    0.00        0 OLD_PROF_TEST                  1     35
         5    0.00        0 OLD_PROF_TEST                  1     15
         4    0.00        1                                8      7
         4    0.00        1 OLD_PROF_TEST                  1     76
         3    0.00        1                                2      8
         2    0.00        1 OLD_PROF_TEST                  1     62
         2    0.00        1 OLD_PROF_TEST                  1     13
         2    0.00        1 TABLE_COUNT_TYPE               6      5
         2    0.00        2 OLD_PROF_TEST                  1     72
         2    0.00        2 OLD_PROF_TEST                  1     45
         2    0.00        2 OLD_PROF_TEST                  1     49
         2    0.00        2 OLD_PROF_TEST                  1     46
         2    0.00        2 OLD_PROF_TEST                  1     58
         1    0.00        1 OLD_PROF_TEST                  1     73
         1    0.00        1                                2      6
         1    0.00        1 OLD_PROF_TEST                  1     59
         1    0.00        1 OLD_PROF_TEST                  1     11
         1    0.00        2 OLD_PROF_TEST                  1     54
         1    0.00        2 OLD_PROF_TEST                  1     84
         0    0.00        0 OLD_PROF_TEST                  1      1
         0    0.00        0 OLD_PROF_TEST                  1     88
         0    0.00        0                                8      9
         0    0.00        0                                2      1
         0    0.00        0                                2      2
         0    0.00        0 OLD_PROF_TEST                  1      3
         0    0.00        0 OLD_PROF_TEST                  1      5
         0    0.00        0 OLD_PROF_TEST                  1      9
         0    0.00        0 OLD_PROF_TEST                  1     20
         0    0.00        1 TABLE_COUNT_TYPE               6      4
         0    0.00        1                                8      2
         0    0.00        2 OLD_PROF_TEST                  1     39
         0    0.00        2 OLD_PROF_TEST                  1     55
         0    0.00        2 OLD_PROF_TEST                  1     69
         0    0.00        2 OLD_PROF_TEST                  1     38
         0    0.00        2 OLD_PROF_TEST                  1     81
         0    0.00        2 OLD_PROF_TEST                  1     42
         0    0.00        2 OLD_PROF_TEST                  1     68

65 rows selected.

Referring to the package, type and anonymous blocks, I assigned labels to all the lines having more than 10µs, as follows:

   MICRO_S SECONDS    CALLS UNIT_NAME            UNIT_NUMBER  LINE#
---------- ------- -------- -------------------- ----------- ------
    729932    0.73        1                                5     13  B2: FETCH
    569563    0.57        2 OLD_PROF_TEST                  1     56  B_Calls_A (Rest_a_While)
    377880    0.38        2 OLD_PROF_TEST                  1     82  DBFunc (Rest_a_While)
    166019    0.17        2 OLD_PROF_TEST                  1     70  R_Calls_R (Rest_a_While)
    150117    0.15        2 OLD_PROF_TEST                  1     43  A_Calls_B (Rest_a_While, section 2)
     72742    0.07        2 OLD_PROF_TEST                  1     40  A_Calls_B (Rest_a_While, section 1)
     56473    0.06        1 TABLE_COUNT_TYPE               6      6  SELECT
      3338    0.00        1                                5      8  B2: SELECT DBFunc
       258    0.00        1                                5     12  B2: OPEN
       109    0.00        1                                5     16  B2: Assign Table_Count_Type
        68    0.00        2 OLD_PROF_TEST                  1     67  Put_Line
        66    0.00        2                                4      1  Auxiliary SERVEROUTPUT block for B2 (surmised)
        60    0.00        2                                7      1  Auxiliary SERVEROUTPUT block for B3 (surmised)
        60    0.00        2                                3      1  Auxiliary SERVEROUTPUT block for B1 (surmised)
        44    0.00        1                                5     14  B2: CLOSE
        42    0.00        0                                2      5  B1: Call to Start_Profiling 
        31    0.00        1 OLD_PROF_TEST                  1     18  RETURN DBMS_Profiler.Stop_Profiler;
        26    0.00        1                                8      5  B3: Call R_Calls_R
        13    0.00        0                                5      1  B2: DECLARE

Notes on Output of Flat Profiler
There were six units with no linked information in DBMS_PROFILER_UNITS. By examining the data, I was able to associate unit numbers 2, 5 and 8 with my anonymous blocks B1, B2 and B3. That left three unassigned, and I have surmised that these correspond to the auxiliary blocks associated with processing server output that we earlier surmised when examining the output from the hierarchical profiler.

  • The useful call tree structure is not present in the data from the old profiler
  • However, the results are at a line level, which the hierarchical profiler does not provide; for example, the two sections of A_Calls_B are reported separately
  • Deciphering the output requires significantly more manual effort than with the hierarchical profiler
  • Both old and new profiler have their own advantages, and so both should be considered of value
  • Manual code timing offers more flexibility in terms of aggregating lines and call instances, but requires more effort…
  • …but not as much as I thought. As noted later on the second example, after reading another article on the profiler, I realised that I could join the system table ALL_SOURCE to see the text of the line (where available)

Second example: Flat profiler omits some detail timings
After posting the first draft of this article, which was about the newer hierarchical profiler only, I noticed a new post on an old AskTom thread on the older flat profiler. The post concerned a discrepancy between reported times at the aggregate level and detail levels. I suggested using the hierarchical profiler might resolve the problem Try the hierarchical profiler…, and then added sections on the old profiler and on manual timing to this article for comparison. However, my example programs above do not include the AskTom scenario, so I later decided to add a new small scenario to illustrate it and now report the results here. The new test code consists of a PL/SQL block with two calls to DBMS_Lock.Sleep, for 3 and 6 seconds, with the profiling code around them. The driving scripts and output files are included in the zip file attached, and I list summary results below:

I later came upon another artilce on the flat profiler, Profiling PL/SQL with dbms_profiler where the author has joined the system table ALL_SOURCE to get the text of the line profiled, which makes interpretation easier. I have then updated the line-level query as follows:

PROMPT Profiler data by time (PLSQL_PROFILER_DATA)
SELECT Round (dat.total_time/1000, 0)  micro_s,
       Round (dat.total_time/1000000000, 2) seconds,
       dat.total_occur calls,
       unt.unit_name,
       dat.unit_number,
       dat.line#,
       Trim (src.text) text
  FROM plsql_profiler_data dat
  LEFT JOIN plsql_profiler_units unt
    ON unt.runid            = dat.runid
   AND unt.unit_number      = dat.unit_number
  LEFT JOIN all_source      src
    ON src.type             IN ('PACKAGE BODY','FUNCTION','PROCEDURE','TRIGGER')
   AND src.name             = unt.unit_name 
   AND src.line             = dat.line# 
   AND src.owner            = unt.unit_owner 
   AND src.type             = unt.unit_type
 WHERE dat.runid            = :runid
   AND dat.total_time       > 0
 ORDER BY 1 DESC, 2, 3

Of course the text is only available for stored source, so excludes lines from anonymous blocks.

Flat Profiler

Run header (PLSQL_PROFILER_RUNS)

     RUNID RUN_DATE        MICRO_S    SECONDS
---------- ------------ ---------- ----------
         5 20:34:45        9220000       9.22

Profiler data by unit (PLSQL_PROFILER_DATA)

UNIT_NAME            UNIT_NUMBER    MICRO_S SECONDS    CALLS
-------------------- ----------- ---------- ------- --------
                               2        200    0.00        3
UTILS                          1         30    0.00        3

Profiler data by time (PLSQL_PROFILER_DATA)

   MICRO_S SECONDS    CALLS UNIT_NAME            UNIT_NUMBER  LINE# TEXT
---------- ------- -------- -------------------- ----------- ------ ------------------------------------------------
       136    0.00        1                     2      7
        21    0.00        1                     2     10
        19    0.00        1                     2     14
        14    0.00        0 UTILS                          1    343 FUNCTION Stop_D_Profiling RETURN PLS_INTEGER IS
        13    0.00        1 UTILS                          1    346 RETURN DBMS_Profiler.Stop_Profiler;
         6    0.00        1 UTILS                          1    341 END Start_D_Profiling;
         2    0.00        1 UTILS                          1    339 RETURN l_run_number;

7 rows selected.

Hierarchical Profiler

Run header (DBMSHP_RUNS)

     RUNID RUN_TIMESTAMP                   MICRO_S    SECONDS RUN_COMMENT
---------- ---------------------------- ---------- ---------- -----------------------------
        16 19-MAR-13 21.37.00.571000       9000292          9 Profile for DBMS_Lock.Sleep

Functions called (DBMSHP_FUNCTION_INFO)

OWNER      MODULE               FUNCTION                LINE#      SUB_T      FUN_T  CALLS
---------- -------------------- ---------------------- ------ ---------- ---------- ------
BRENDAN    UTILS                1: STOP_H_PROFILING       322          8          8      1
SYS        DBMS_HPROF           2: STOP_PROFILING          59          0          0      1
SYS        DBMS_LOCK            3: SLEEP                  197    9000279    9000279      2
SYS        DBMS_LOCK            4: __pkg_init               0          5          5      1

Function call parent-child links (DBMSHP_PARENT_CHILD_INFO)

OWNER_P MODULE_P  FUNCTION_P            OWNER_C MODULE_C    FUNCTION_C         SUB_T FUN_T  CALLS
------- --------- --------------------- ------- ----------- ------------------ ----- ----- ------
BRENDAN UTILS     1: STOP_H_PROFILING   SYS     DBMS_HPROF  2: STOP_PROFILING      0     0      1

BPF Recursive Subquery Factor Tree Query

Function tree                              SUB_T      FUN_T  CALLS  Row
------------------------------------- ---------- ---------- ------ ----
3: SYS.DBMS_LOCK.SLEEP                   9000279    9000279      2    1
1: BRENDAN.UTILS.STOP_H_PROFILING              8          8      1    2
  2: SYS.DBMS_HPROF.STOP_PROFILING             0          0      1    3
4: SYS.DBMS_LOCK.__pkg_init                    5          5      1    4

Manual Profiler

Timer Set: Profiling DBMS_Lock.Sleep, Constructed at 19 Mar 2013 21:38:54, written at 21:39:03
==============================================================================================
[Timer timed: Elapsed (per call): 0.05 (0.000045), CPU (per call): 0.04 (0.000040), calls: 1000, '***' denotes corrected line below]

Timer               Elapsed          CPU          Calls        Ela/Call        CPU/Call
--------------   ----------   ----------   ------------   -------------   -------------
3 second sleep         3.00         0.00              1         3.00000         0.00000
6 second sleep         6.00         0.00              1         6.00100         0.00000
(Other)                0.00         0.00              1         0.00000         0.00000
--------------   ----------   ----------   ------------   -------------   -------------
Total                  9.00         0.00              3         3.00033         0.00000
--------------   ----------   ----------   ------------   -------------   -------------

Notes on Results for Second Example

  • The flat profiler shows 9s at header level but only 230µs at detail level because DBMS_Lock.Sleep does not permit profiling by the user running the script
  • The hierarchical profiler shows 9s at header level and a total of 9s in 2 calls to DBMS_Lock.Sleep
  • Manual profiling shows the two calls to DBMS_Lock.Sleep taking 3 and 6 seconds

Conclusions

  • Running Oracle’s hierarchical profiler would seem to be the default first step in tuning PL/SQL programs from v11.1
  • Some care is needed in interpreting the output data; I’ve provided a query for displaying the hierarchies
  • Performance is recorded only down to function level, so it will still often be worthwhile to use the old flat profiler in addition
  • Manually timing code sections also still has a part to play, in terms of instrumentation and greater flexibility where necessary

Brendan HProf Code
Example 2