Most Database, Data Integration (DI), and Business Intelligence (BI) import bridges depend on a common "SQL Transformation Expression Parsing" library to extract the data flow lineage out of the many transformation expressions used by these technologies. In fact, each tool and technology has its expression syntax. DI and BI tools often defined their transformation expression languages with a syntax close to programming languages like Visual Basic or Java. However, most DI and BI tools have support for SQL override, allowing data transformations to be defined as the full SQL executed as is by the original database. Note that this SQL parsing does not support the syntax of special data stores such as MOLAP Cubes often seen in BI tools such as Oracle Hyperion Essbase.
The section explains the general limitations in expression parsing, in particular with respect to the SQL syntax variations from the SQL 99 standard to the limitations in proprietary variations and database extensions for each popular database such as IBM DB2, Microsoft SQL Server, Oracle, Teradata, etc.
Furthermore, the support for SQL use cases from Data Definition Language (DDL) to Data Manipulation Language (DML) is increasingly limited by the complexity as the SQL features:
SQL Views
SQL Views are defined by a single SQL statement. SQL Views are used to read databases and are therefore mostly based on a complex SELECT statement. SQL Views are well supported with only a few limitations based on the some proprietary extensions of some databases.
SQL Operations (Stored Procedures, Functions and Triggers)
SQL Operations are defined by multiple SQL statements that are pre-compiled and stored on the server for reuse, and can have parameters. SQL Operations often contain multiple DML statements (SELECT, INSERT, UPDATE, DELETE, etc.), and use control statements (IF, THEN, ELSE, CASE, FOR, etc.). Support for SQL Operations is limited to functions and stored procedures with the following support and limitations:
-
The general support for Stored Procedures is defined as follows:
-
Stored procedures reading from and writing to tables (catalog objects): supported
-
Stored procedures lineage support for IN, OUT, INOUT parameters: supported
-
Stored procedures with cursors: supported
-
Stored procedures with temporary/Local variables: supported
-
Stored procedures using functions: limited support
-
Stored procedures using public/private synonyms: supported
-
Stored procedures using other stored procedures: limited support
-
Stored procedures using temporary tables: supported
-
Stored procedures with error management: not relevant for data flow linage
-
Parsing of independent SQL Stored Procedures and Functions is currently limited to the following database technologies: (although some statements specific to database proprietary SQL syntax may not be supported)
-
Microsoft SQL Server Database
-
Oracle Database
-
PostgreSQL Database
-
Pivotal GreenPlum Database
-
Teradata Database
-
Most dynamic and chaining aspects of the operations are not supported such as:
-
operations calling other operations
-
operations that execute dynamic SQL (e.g. SQL text passed as parameter)
-
external operations (e.g. includes)
-
Some usage aspects of such operations by DI and BI tools might be limited:
-
using SQL operations to read data is supported in most DI and BI tool import bridges
-
using SQL operations to write data is supported only in DI tool import bridges such as Informatica PowerCenter, Microsoft SSIS and Talend
SQL DML (DI/ETL) Scripts
SQL DML Script files usually consist of multiple SQL statements and sometimes designed to be executed from outside the database, and therefore not necessarily compiled in the database. Such scripts are often handcrafted and sometimes generated or part of the database utilities (e.g. Teradata fastload, bulkload and multiload) Parsing of independent SQL script files (.sql) is currently limited to the following database technologies:
-
Apache Hadoop Hive Database (Cloudera, Hortonworks, MapR) SQL DML (DI/ETL) Script (HiveQL)
-
Apache Sqoop
-
IBM DB2 Universal Database (UDB) SQL DML (DI/ETL) Script
-
Microsoft SQL Server Database SQL DML (DI/ETL) Script (Transact-SQL)
-
Oracle Database SQL DML (DI/ETL) Script (PL/SQL)
-
PostgreSQL Database SQL DML (DI/ETL) Script
-
Teradata Database SQL DML (DI/ETL) Script (BTEQ, Fastload, etc.)
SQL Expression Analyzer Based Import Bridges
MIMB bridges depending on "Transformation Expression Parsing" include the following types of import bridges:
-
Database Servers (via JDBC), such as Oracle Database Servers
-
Database SQL DDL Scripts, such as Oracle Database SQL DDL Script
-
Database SQL DML (DI/ETL) Scripts, such as Oracle Database SQL DML (DI/ETL) Script (PL/SQL)
-
Data modeling tools (for SQL View), such as Erwin Data Modeler
-
Data Integration (DI) tools (for ETL Transformations), such as Informatica PowerCenter
-
Business Intelligence (BI design, OLAP, reporting) tools, such SAP BusinessObjects Designer
-
Metadata Management (MM tools), such as Apache Atlas
SQL Expression Analyzer intrinsic limitations
Unsupported features
- Control lineage is not available for certain expressions inside operations.
Limitations on SQL99 Keywords
The SQL99 syntax includes reserved keywords (e.g. CREATE that cannot be used as a table name), and non reserved keywords. Some of these keywords are used for compound terms like "order by", "group by", "union join". All SQL99 keywords are properly parsed and will not fail the expression analyzer used in bridges. However, some keywords have a direct data flow impact on lineage, and some do not. The following list provide the details:
-
SQL99 Reserved Key Words
-
with direct data flow lineage impact:
ADD, ALL, ALTER, AND, ANY, AS, ASC, BEGIN, BOTH, BY, CALL, CASE, CAST, CHAR, CHARACTER, CONSTRAINT, CREATE, CURRENT, CURSOR, DATE, DAY, DEC, DECIMAL, DEFAULT, DELETE, DESC, DISTINCT, DOUBLE, DROP, ELSE, END, EXCEPT, FALSE, FLOAT, FOR, FOREIGN, FROM, FULL, FUNCTION, GRANT, GROUP, HAVING, HOUR, IN, INNER, INSERT, INT, INTEGER, INTERSECT, INTERVAL, INTO, IS, JOIN, KEY, LEADING, LEFT, MINUTE, MONTH, NATURAL, NO, NOT, NULL, NUMERIC, ON, OR, ORDER, OUT, OUTER, PRECISION, PRIMARY, PROCEDURE, RECURSIVE, REFERENCES, RETURN, REVOKE, RIGHT, ROW, ROWS, SECOND, SELECT, SET, SMALLINT, TABLE, THEN, TIME, TIMESTAMP, TO, TRAILING, TRUE, UNION, UNIQUE, UPDATE, USING, VALUES, VARCHAR, VARYING, VIEW, WHEN, WHERE, WITH, YEAR
-
with no lineage impact:
ABSOLUTE, ACTION, ADMIN, AFTER, AGGREGATE, ALIAS, ALLOCATE, ARE, ARRAY, ASSERTION, AT, AUTHORIZATION, BEFORE, BINARY, BIT, BLOB, BOOLEAN, BREADTH, CASCADE, CASCADED, CATALOG, CHECK, CLASS, CLOB, CLOSE, COLLATE, COLLATION, COLUMN, COMMIT, COMPLETION, CONNECT, CONNECTION, CONSTRAINTS, CONSTRUCTOR, CONTINUE, CORRESPONDING, CROSS, CUBE, CURRENT_DATE, CURRENT_PATH, CURRENT_ROLE, CURRENT_TIME, CURRENT_TIMESTAMP, CURRENT_USER, CYCLE, DATA, DEALLOCATE, DECLARE, DEFERRABLE, DEFERRED, DEPTH, DEREF, DESCRIBE, DESCRIPTOR, DESTROY, DESTRUCTOR, DETERMINISTIC, DICTIONARY, DIAGNOSTICS, DISCONNECT, DOMAIN, DYNAMIC, EACH, END-EXEC, EQUALS, ESCAPE, EVERY, EXCEPTION, EXEC, EXECUTE, EXTERNAL, FETCH, FIRST, FOUND, FREE, GENERAL, GET, GLOBAL, GO, GOTO, GROUPING, HOST, IDENTITY, IGNORE, IMMEDIATE, INDICATOR, INITIALIZE, INITIALLY, INOUT, INPUT, ISOLATION, ITERATE, LANGUAGE, LARGE, LAST, LATERAL, LESS, LEVEL, LIKE, LIMIT, LOCAL, LOCALTIME, LOCALTIMESTAMP, LOCATOR, MAP, MATCH, MODIFIES, MODIFY, MODULE, NAMES, NATIONAL, NCHAR, NCLOB, NEW, NEXT, NONE, OBJECT, OF, OFF, OLD, ONLY, OPEN, OPERATION, OPTION, ORDINALITY, OUTPUT, PAD, PARAMETER, PARAMETERS, PARTIAL, PATH, POSTFIX, PREFIX, PREORDER, PREPARE, PRESERVE, PRIOR, PRIVILEGES, PUBLIC, READ, READS, REAL, REF, REFERENCING, RELATIVE, RESTRICT, RESULT, RETURNS, ROLE, ROLLBACK, ROLLUP, ROUTINE, SAVEPOINT, SCHEMA, SCROLL, SCOPE, SEARCH, SECTION, SEQUENCE, SESSION, SESSION_USER, SETS, SIZE, SOME, SPACE, SPECIFIC, SPECIFICTYPE, SQL, SQLEXCEPTION, SQLSTATE, SQLWARNING, START, STATE, STATEMENT, STATIC, STRUCTURE, SYSTEM_USER, TEMPORARY, TERMINATE, THAN, TIMEZONE_HOUR, TIMEZONE_MINUTE, TRANSACTION, TRANSLATION, TREAT, TRIGGER, UNDER, UNKNOWN, UNNEST, USAGE, USER, VALUE, VARIABLE, WHENEVER, WITHOUT, WORK, WRITE, ZONE
-
SQL99 Non Reserved Key Words
-
with direct data flow lineage impact:
BETWEEN, CONVERT, EXISTS, NUMBER, OVERLAPS
-
with no lineage impact:
ABS, ADA, ASENSITIVE, ASSIGNMENT, ASYMMETRIC, ATOMIC, AVG, BIT_LENGTH, BITVAR, C, CALLED, CARDINALITY, CATALOG_NAME, CHAIN, CHAR_LENGTH, CHARACTER_LENGTH, CHARACTER_SET_CATALOG, CHARACTER_SET_NAME, CHARACTER_SET_SCHEMA, CHECKED, CLASS_ORIGIN, COALESCE, COBOL, COLLATION_CATALOG, COLLATION_NAME, COLLATION_SCHEMA, COLUMN_NAME, COMMAND_FUNCTION, COMMAND_FUNCTION_CODE, COMMITTED, CONDITION_NUMBER, CONNECTION_NAME, CONSTRAINT_CATALOG, CONSTRAINT_NAME, CONSTRAINT_SCHEMA, CONTAINS, COUNT, CURSOR_NAME, DATETIME_INTERVAL_CODE, DATETIME_INTERVAL_PRECISION, DEFINED, DEFINER, DISPATCH, DYNAMIC_FUNCTION, DYNAMIC_FUNCTION_CODE, EXISTING, EXTRACT, FINAL, FORTRAN, G, GENERATED, GRANTED, HIERARCHY, HOLD, IMPLEMENTATION, INFIX, INSENSITIVE, INSTANCE, INSTANTIABLE, INVOKER, K, KEY_MEMBER, KEY_TYPE, LENGTH, LOWER, M, MAX, MIN, MESSAGE_LENGTH, MESSAGE_OCTET_LENGTH, MESSAGE_TEXT, METHOD, MOD, MORE, MUMPS, NAME, NULLABLE, NULLIF, OCTET_LENGTH, OPTIONS, OVERLAY, OVERRIDING, PASCAL, PARAMETER_MODE, PARAMETER_NAME, PARAMETER_ORDINAL_POSITION, PARAMETER_SPECIFIC_CATALOG, PARAMETER_SPECIFIC_NAME, PARAMETER_SPECIFIC_SCHEMA, PLI, POSITION, REPEATABLE, RETURNED_LENGTH, RETURNED_OCTET_LENGTH, RETURNED_SQLSTATE, ROUTINE_CATALOG, ROUTINE_NAME, ROUTINE_SCHEMA, ROW_COUNT, SCALE, SCHEMA_NAME, SECURITY, SELF, SENSITIVE, SERIALIZABLE, SERVER_NAME, SIMPLE, SOURCE, SPECIFIC_NAME, SIMILAR, SUBLIST, SUBSTRING, SUM, STYLE, SUBCLASS_ORIGIN, SYMMETRIC, SYSTEM, TABLE_NAME, TRANSACTIONS_COMMITTED, TRANSACTIONS_ROLLED_BACK, TRANSACTION_ACTIVE, TRANSFORM, TRANSFORMS, TRANSLATE, TRIGGER_CATALOG, TRIGGER_SCHEMA, TRIGGER_NAME, TRIM, TYPE, UNCOMMITTED, UNNAMED, UPPER, USER_DEFINED_TYPE_CATALOG, USER_DEFINED_TYPE_NAME, USER_DEFINED_TYPE_SCHEMA
Limitations on Database extensions
In addition to the list of SQL99 keywords, the expression analyzer has been enhanced to support common database specific constructs. The limitations are in the list below:
-
IBM DB2 extension limitations:
-
The CONCAT keyword in DB2 can be used as a function (supported) or as an operator (not supported)
SELECT CONCAT(first_name, last_name) FROM customer; SELECT first_name CONCAT last_name FROM customer;
-
Oracle extension limitations:
-
Support for Oracle Text Literals is limited to the two single quotation marks mechanism, like:
'Jackie''s raincoat'
but not the Q/q alternative quoting mechanism, like:Q<'Jackie's raincoat>'
Limitations on Data Type Support
The expression analyzer doesn't support user defined complex data types.