Design Patterns for Extracting Relational Data from XML

I recently replicated a very complicated Informatica data load from XML files into two Oracle staging tables in a much simpler way using just three SQL insert statements. I was able to load over 13,000 XML files into the tables in around 3 minutes on a windows PC running Oracle 11.2 XE, populating about 2 million records. This article will focus on the general design patterns involved in selecting relational data from XML using SQL in Oracle 11.2.

XML Selection Scenarios

The hierarchical structure of XML files means that the data may be stored at various different levels, but a relational SELECT statement returns data in a flat format. One way of converting between hierarchical and flat formats is to have separate SELECT statements for each level of data, and have any additional post-processing performed after inserting into staging tables. However, it is also possible to flatten the hierarchy within a single SELECT query clause in the case of a strict linear hierarchy.

It is instructive to consider the general case where any number of entities can occur within a linear hierarchy. If we understand how to deal with the case of two entities in a master-detail relationship, along with global values, then we will know how to deal with the general case, and so I will construct just such a test example.

XML Test Data

Here is the test XML file

<elem-0-body>
    <elem-1-global attr-1-global="id_1_global">
        <field-1-global>val_global</field-1-global>
    </elem-1-global>
    <list-1-master>
        <elem-1-master>
            <field-1-master>val_master_m1</field-1-master>
            <list-2-detail>
                  <elem-2-detail>
                       <field-2-detail>val_detail_m1_d1</field-2-detail>
                  </elem-2-detail>
                  <elem-2-detail>
                       <field-2-detail>val_detail_m1_d2</field-2-detail>
                  </elem-2-detail>
            </list-2-detail>
        </elem-1-master>
        <elem-1-master>
            <field-1-master>val_master_m2</field-1-master>
            <list-2-detail>
                  <elem-2-detail>
                       <field-2-detail>val_detail_m2_d1</field-2-detail>
                  </elem-2-detail>
                  <elem-2-detail>
                       <field-2-detail>val_detail_m2_d2</field-2-detail>
                  </elem-2-detail>
            </list-2-detail>
        </elem-1-master>
        <elem-1-master>
            <field-1-master>val_master_m3</field-1-master>
        </elem-1-master>
    </list-1-master>
</elem-0-body>

The file represents data in a master-detail relationship, plus data that are global to the file. As can be seen, there is a global entity, elem-1-global, that occurs exactly once, a master entity, elem-1-master, that may occur multiple times and that contains a child entity, elem-2-detail, that may occur multiple times. Each entity has been assigned a single sample field. One entity has an XML attribute.

There are three master records, having two, two and zero detail records respectively. We will filter out one of the second master's detail records.

In an article from January 2012, Data Modelling of XML SOAP Documents, I introduced my own diagram notation for hierarchical data structures, and re-use it below, showing the links to entities in relational form. I have added rounded corners to distinguish attributes from fields.

XMLSQL

SQL
Reading the XML File

CREATE TABLE xml_design_pattern (
       xml_field   XMLTYPE) 
       XMLTYPE COLUMN xml_field STORE AS BINARY XML
/
INSERT INTO xml_design_pattern VALUES (
       XMLTYPE (BFilename ('BRENDAN_IN_DIR', 'DESIGN_PATTERN.xml'),                      
                           NLS_Charset_id ('AL32UTF8'))
)
/

Unfiltered, Outer Join on Detail

SELECT /*+ GATHER_PLAN_STATISTICS XMLOJA */ 
       x1.global_id, x1.global_val, x2.master, x3.detail
  FROM xml_design_pattern t,
     XMLTable ('/elem-0-body'
                   PASSING t.xml_field
                   COLUMNS global_id           VARCHAR2(100) PATH 'elem-1-global/@attr-1-global',
                           global_val          VARCHAR2(100) PATH 'elem-1-global/field-1-global',
                           l2_xml              XMLTYPE       PATH 'list-1-master') x1,
     XMLTable ('/list-1-master/elem-1-master'
                   PASSING x1.l2_xml
                   COLUMNS master              VARCHAR2(100) PATH 'field-1-master',
                           l3_xml              XMLTYPE       PATH 'list-2-detail') x2,
     XMLTable ('/list-2-detail/elem-2-detail'
                   PASSING x2.l3_xml
                   COLUMNS detail              VARCHAR2(100) PATH 'field-2-detail') (+) x3
ORDER BY 1, 2, 3, 4

Output

GLOBAL_ID       GLOBAL_VAL      MASTER          DETAIL
--------------- --------------- --------------- --------------------
id_1_global     val_global      val_master_m1
id_1_global     val_global      val_master_m2
id_1_global     val_global      val_master_m3

The result shows that none of the detail records have been joined, while the expected result would be that all detail records would be joined with null detail only for m3.

Filtered, Outer Join on Detail

SELECT x1.global_id, x1.global_val, x2.master, x3.detail
  FROM xml_design_pattern t,
     XMLTable ('/elem-0-body'
                   PASSING t.xml_field
                   COLUMNS global_id           VARCHAR2(100) PATH 'elem-1-global/@attr-1-global',
                           global_val          VARCHAR2(100) PATH 'elem-1-global/field-1-global',
                           l2_xml              XMLTYPE       PATH 'list-1-master') x1,
     XMLTable ('/list-1-master/elem-1-master'
                   PASSING x1.l2_xml
                   COLUMNS master              VARCHAR2(100) PATH 'field-1-master',
                           l3_xml              XMLTYPE       PATH 'list-2-detail') x2,
     XMLTable ('/list-2-detail/elem-2-detail[field-2-detail != "val_detail_m2_d1"]'
                   PASSING x2.l3_xml
                   COLUMNS detail              VARCHAR2(100) PATH 'field-2-detail') (+) x3
ORDER BY 1, 2, 3, 4

Output

no rows selected

The result shows that no records have been returned, while the expected result would be that all master records would be returned, with detail records joined for all except the filtered out detail, with null detail only for m3.

Unfiltered, Outer Joins on all
SQL

SELECT x1.global_id, x1.global_val, x2.master, x3.detail
  FROM xml_design_pattern t,
     XMLTable ('/elem-0-body'
                   PASSING t.xml_field
                   COLUMNS global_id           VARCHAR2(100) PATH 'elem-1-global/@attr-1-global',
                           global_val          VARCHAR2(100) PATH 'elem-1-global/field-1-global',
                           l2_xml              XMLTYPE       PATH 'list-1-master') (+) x1,
    XMLTable ('/list-1-master/elem-1-master'
                   PASSING x1.l2_xml
                   COLUMNS master              VARCHAR2(100) PATH 'field-1-master',
                           l3_xml              XMLTYPE       PATH 'list-2-detail') (+) x2,
     XMLTable ('/list-2-detail/elem-2-detail'
                   PASSING x2.l3_xml
                   COLUMNS detail              VARCHAR2(100) PATH 'field-2-detail') (+) x3
ORDER BY 1, 2, 3, 4

Output

GLOBAL_ID       GLOBAL_VAL      MASTER          DETAIL
--------------- --------------- --------------- --------------------
id_1_global     val_global      val_master_m1   val_detail_m1_d1
id_1_global     val_global      val_master_m1   val_detail_m1_d2
id_1_global     val_global      val_master_m2   val_detail_m2_d1
id_1_global     val_global      val_master_m2   val_detail_m2_d2
id_1_global     val_global      val_master_m3

The result shows that all records are returned, as expected.

Unfiltered, Inner Joins
The attached file shows that, without the outer joins, records are returned only for the first two master records that have detail records, as expected.

Filtered, Outer Joins on all

SQL

SELECT x1.global_id, x1.global_val, x2.master, x3.detail
  FROM xml_design_pattern t,
     XMLTable ('/elem-0-body'
                   PASSING t.xml_field
                   COLUMNS global_id           VARCHAR2(100) PATH 'elem-1-global/@attr-1-global',
                           global_val          VARCHAR2(100) PATH 'elem-1-global/field-1-global',
                           l2_xml              XMLTYPE       PATH 'list-1-master') (+) x1,
     XMLTable ('/list-1-master/elem-1-master'
                   PASSING x1.l2_xml
                   COLUMNS master              VARCHAR2(100) PATH 'field-1-master',
                           l3_xml              XMLTYPE       PATH 'list-2-detail') (+) x2,
     XMLTable ('/list-2-detail/elem-2-detail[field-2-detail != "val_detail_m2_d1"]'
                   PASSING x2.l3_xml
                   COLUMNS detail              VARCHAR2(100) PATH 'field-2-detail') (+) x3
ORDER BY 1, 2, 3, 4

Output

GLOBAL_ID       GLOBAL_VAL      MASTER          DETAIL
--------------- --------------- --------------- --------------------
id_1_global     val_global      val_master_m1   val_detail_m1_d1
id_1_global     val_global      val_master_m1   val_detail_m1_d2
id_1_global     val_global      val_master_m2   val_detail_m2_d2
id_1_global     val_global      val_master_m3

The result shows that, with all joins outer joins, all records are returned except for the filtered-out detail record, as expected.

Filtered, Ansi Outer Joins on all

SQL

SELECT x1.global_id, x1.global_val, x2.master, x3.detail
  FROM xml_design_pattern t
    LEFT JOIN
     XMLTable ('/elem-0-body'
                   PASSING t.xml_field
                   COLUMNS global_id           VARCHAR2(100) PATH 'elem-1-global/@attr-1-global',
                           global_val          VARCHAR2(100) PATH 'elem-1-global/field-1-global',
                           l2_xml              XMLTYPE       PATH 'list-1-master') x1 ON 1=1
    LEFT JOIN
     XMLTable ('/list-1-master/elem-1-master'
                   PASSING x1.l2_xml
                   COLUMNS master              VARCHAR2(100) PATH 'field-1-master',
                           l3_xml              XMLTYPE       PATH 'list-2-detail') x2 ON 1=1
    LEFT JOIN
     XMLTable ('/list-2-detail/elem-2-detail[field-2-detail != "val_detail_m2_d1"]'
                   PASSING x2.l3_xml
                   COLUMNS detail              VARCHAR2(100) PATH 'field-2-detail') x3 ON 1=1
ORDER BY 1, 2, 3, 4

Output: The result in the attached file shows that, with all joins Ansi outer joins, all records are returned except for the filtered-out detail record, as expected.

Notes on SQL

XMLTable and XPath Syntax
Thh XMLTable function represents a rowset retrieved from an XML fragment. The first string within the call represents an XPath expression for the root element defining the rowset in the XML fragment passed. The number of occurrences in the fragment is the rowset cardinality.

The PASSING clause passes in the XMLTYPE field from a prior table or XMLTable instance from which the XML fragment is extracted. For a detail entity, the field will be passed from a master record in a a prior instance, and the instance order matters.

The COLUMNS clause defines the columns that can be included in the select list, and specifies a field for each column by an XPath expression. The expression is relative to the XMLTable root and must specify a single element instance.

XQuery Error (ORA-19279)
A common error in development is:
'ORA-19279: XPTY0004 - XQuery dynamic type mismatch: expected singleton sequence - got multi-item sequence'
The error occurs when the XPath expression for a column selects more than one instance.

Filtering
Filtering of the XNL elements is effected using standard XPath notation, with conditions placed in square brackets after the element name.

Fields and Attributes
Elements may contain attributes, which are denoted in standard XML syntax by preceding the attribute name with '@'. Further details on XML syntax for Oracle, including additional features such as namespaces, can be found in Oracle XML DB Developer's Guide

Ansi and Oracle Join Syntaxes
Oracle introduced the Ansi standard join syntax in version 9, and it is generally to be preferred to its older proprietary syntax. However, this is debatable in the special case of SQL incorporating XMLTable because joining occurs in a non-standard way within the PASSING and other clauses of XMLTable, and not in the mandatory (except for cross joins) ON clause. In my examples, I have had to join ON 1=1 as a work-around.

Outer Joins
The examples show that outer-joining only the detail table does not work correctly. This appears to be a bug in Oracle 11.2. We have worked around it by outer-joining all tables.

Also note that the outer-join (+) needs to be after the table instance, unlike in standard SQL, where it goes after the join columns.

XMLTYPE Storage Clause
The XMLTYPE column in the table used to load the XML file has a storage clause that can specify either CLOB or BINARY XML (and maybe others) as the storage mode. It is vital for performance to choose BINARY XML if the table is going to be queried using XMLTable.

XPlan Statistics
I included calls to DBMS_XPlan in some of the queries in the attached files, and it can be seen that the CBO cardinalities are extermely inaccurate. This is perhaps not surprising as there is only one record in the table, which is exploded within the SQL, and the CBO is obviously not designed to optimise this.
SQL XML Design Patterns Files






3 thoughts on “Design Patterns for Extracting Relational Data from XML

  1. After having worked on Microsoft SQL Servers for 15+ years, I recently got a project based on Oracle database.
    A standard task: create a stored procedure, pass XML to one of its parameters and dispatch the content to database tables.
    I was very surprised that the following never worked correctly:
    SELECT *
    FROM XMLTable(...) x LEFT OUTER JOIN RealTable t ON t.RowID = x.RowID
    WHERE t.RowID IS NULL;

    It never returns any single row. It looks to me that joining XMLTable with a real one behaves as if there was always an INNER JOIN. Why so?
    Is it related to the same bug you've mentioned?

Leave a Reply

Your email address will not be published. Required fields are marked *