Export and Import a Data Mapping for Modeling (Emulating) a DI (ETL/ELT/CDC) Tool

Overview

Any data mapping with replication mappings and/or query mappings can be exported (without loss) to the Data Mapping Scripts format. This format is based on the standard database SQL Data Manipulation Language (SQL DML) syntax and includes both:

The data connection data models (e.g. database schema, tables, columns) of their source and target data stores faithfully representing any supported technology (RDBMS, NoSQL, File Systems)
The data integration (DI/ETL/ELT/CDC) for the data flow lineage between these data stores.

The specifics of the syntax for the data mapping script format is explained in a sample file at:

*$MM_HOME*/conf/MIRModelBridgeTemplate/DataMappingScript/DataMappingScriptTutorial.sql

The data mapping scripts can be edited /modified, or may be generated entirely from scratch to model (simulate) a DI/ETL/ELT/CDC tool which may not be a part of the supported tools for a native model import.

The data mapping scripts can then be imported as independent DI models or may be imported back into a data mapping, where one may continue to benefit from the UI data mapper UI. The resulting data flow linage would be the same in both cases.

Note that the new data mapping script format provides a number of advantages over the older metadata Excel format (deprecated) including:

The data mapping script format supports a full round trip (export/import) from/to a data mapping, while this is not possible with the older metadata Excel format where the export was more for reporting and the import for bootstrapping.
The data mapping script format files are generally much more compact than the very verbose metadata Excel format files based on classifier/table maps and feature/column maps.
The data mapping script format is easier to generate from SQL based DI/ETL/ELT/CDC tools.
The data mapping script format supports both replication mappings and query mappings, while the metadata Excel format is quite limited in its handling of replication mappings.
The data mapping script format supports more source and target data store technologies (RDBMS, NoSQL, File Systems) that is supported by the metadata Excel format.

Therefore, the export/import of data mapping script based files is the recommended method, while the legacy metadata Excel format is now deprecated and will soon be EOL for the data mapping export/import application.

Export to Data Mapping (SQL like) Scripts

Any data mapping with replication mappings and/or query mappings can be exported (without loss) to the Data Mapping Scripts format. This format is based on the standard database SQL Data Manipulation Language (SQL DML) syntax and includes both:

The data connection data models (e.g. database schema, tables, columns) of their source and target data stores faithfully representing any supported technology (RDBMS, NoSQL, File Systems)
The data integration (DI/ETL/ELT/CDC) for the data flow lineage between these data stores.

The specifics of the syntax for the data mapping script format are explained in a sample file at:

*$MM_HOME*/conf/MIRModelBridgeTemplate/DataMappingScript/DataMappingScriptTutorial.sql

The data mapping scripts can be edited /modified or may be generated entirely from scratch to model (simulate) a DI/ETL/ELT/CDC tool which may not be a part of the supported tools for a native model import.

Steps

Sign in as a user with at least the Metadata Viewing capability object role assignment on the data mapping you are going to export from.
Navigate to the object page for the data mapping model. Create a new empty model if necessary.
Go to More Actions > Export Data Mapping Script.
Go to the Operations widget in the banner and select the Export data mapping... to Show log.
Click Download Operation Files.

Example

Sign in as Administrator and open the Adjustments to Staging data mapping.

Go to More Actions > Export Data Mapping Script.

Go to the Operations widget in the banner and select the Export data mapping... to Show log.

Graphical user interface, text, application Description automatically
generated

Click Download Operation Files.

The result contains three files, one for each query and/or replication mapping in the data mapping.

And the script contains the SQL like text:

CREATE MAPPING CustomerPayment;

CREATE CONNECTION Adjustments TYPE "RELATIONAL DATABASE";

CREATE CONNECTION "Staging DW" TYPE "RELATIONAL DATABASE";

SELECT

CONCAT(Adj.TransSetNm, CONCAT(Adj.AdjNum, Adj.TransDT)) AS PaymentID
COMMENT 'Based upon Transaction Set Name, Adjustment Number and
DateTime'

, Adj.TransDT AS PaymentDate COMMENT 'Transformation of Transaction
DateTime'

, CONCAT(Adj.TransAmt, CONCAT(Adj.AdjNum, CONCAT(Adj.AdjTyp,
CONCAT(AdjType.AdjTypNm, AdjType.AdjTypTxt)))) AS PaymentDescription
COMMENT 'Text constructed from Adjustment Reason and the decoded value
of Adjustment Type Code depending upon the Transaction Set Name:

- if Transaction Set Name is "Other" then it is a simple action with
little information about type and so the type should not be included
here

Otherwise, the decoded value of Adjustment Type Code should be included
in the description text.

, Adj.TransAmt AS PaymentAmount COMMENT 'Customer pay amount is derived
from all adj amount (Transaction Amount)'

, 'Adjustments' AS CheckNumber COMMENT 'Fixed "Adjustments" Check
Number'

, 'Adjustments' AS InvoiceNumber COMMENT 'Fixed "Adjustments"
Invoice Number'

, 'Paid in Full' AS PaymentStatus COMMENT 'Set to "Paid in Full"'

, 'Adjustments' AS PaymentType COMMENT 'Fixed "Adjustments"
PaymentType'

INTO dbo.CustomerPayment@"Staging DW"

FROM dbo.Adj@Adjustments AS Adj

INNER JOIN dbo.AdjType@Adjustments AS AdjType

ON Adj.AdjTyp = AdjType.AdjTyp;

Again, unlike the Metadata Excel Format, where it is NOT a good practice to then re-import or update and re-import the spreadsheet into a data mapping, with the data mapping script format you may perform as many round-trips as you wish.

Import from Data Mapping (SQL like) Scripts

Any data mapping with replication mappings and/or query mappings can be imported (without loss) from the Data Mapping Scripts format. This format is based on the standard database SQL Data Manipulation Language (SQL DML) syntax and includes both:

The data connection data models (e.g. database schema, tables, columns) of their source and target data stores faithfully representing any supported technology (RDBMS, NoSQL, File Systems)
The data integration (DI/ETL/ELT/CDC) for the data flow lineage between these data stores.

The specifics of the syntax for the data mapping script format are explained in a sample file at:

*$MM_HOME*/conf/MIRModelBridgeTemplate/DataMappingScript/DataMappingScriptTutorial.sql

The data mapping scripts can be edited /modified or may be generated entirely from scratch to model (simulate) a DI/ETL/ELT/CDC tool which may not be a part of the supported tools for a native model import.

The data mapping scripts can then be

imported as independent DI models where you may
Stitch the model's connections to any other models (not just the ones originally directly linked to the data mapping
Schedule re-harvesting of the scripts to include changes and support automatic stitching associated with configuration management
imported back into a data mapping
where you may continue to benefit from the data mapper UI
Import into an existing data mapping, even the one you exported from
Import into a newly created data mapping.

The newly created data mapping will be connected (linked) to the original source and target automatically.

Note, the requirements for the second user case (importing back into a data mapping) include a requirement that the connections to the original models must match exactly by pathname. I.e., a data mapping in the repository and its source models and a target model that are a part of the configuration that the data mapping is in. Also, these source and target models are linked by non editable connections, so this connection must be made at the time of the import. I.e., there is a great deal of intelligence built into this import bridge and it hunts for identical structure, but cannot simply "guess" or "make something up" in terms of which source and target models to connect to.

If there is no valid source or target model in the configuration that matches by name (and structure) with what is defined in the data mapping script, then the import log will contain errors explaining this fact. You can, of course, reconnect the end-points to another model with identical structure to the edited structure in the script file.

Finally, generally the best practice, when you are NOT using the data mapping editor to make changes but instead editing the data mapping script source, is to import into a model (not a data mapping model) and thus creating an ETL/DI model that can be stitched and take advantage of support for change and configuration management.

The resulting data flow linage would be the same in all these cases.

Import into a Data Mapping

Steps

Sign in as a user with at least the Metadata Import / Export capability object role assignment on the data mapping you are going to import to.

You may Create a new empty model (a new empty data mapping) or one with existing query and replication mappings. For existing data mapping, the import will add new sub mappings or replace existing ones. However, the import will not remove existing ones.

Navigate to the object page for the data mapping model.
Go to More Actions > Import Data Mapping Script.
Browse for a file (must be a zip file if there multiple scripts) to import.
Click IMPORT.
Go to the Operations widget in the banner and select the Export data mapping... to Show log.
Click Download Operation Files.

Example

Sign in as Administrator and open the Adjustments to Staging data mapping.

Go to More Actions > Import Data Mapping Script.

Browse for the zip file created on the Export earlier.

Go to the Operations widget in the banner and select the Export data mapping... to Show log.

Note it is completed. You may review the log. Then, see the mappings that are imported:

Again, unlike the Metadata Excel Format, where it is NOT a good practice to then re-import or update and re-import the spreadsheet into a data mapping, with the data mapping script format you may perform as many round-trips as you wish.

Import into a Model

Steps

Sign in as a user with at least the Metadata Management capability object role assignment on the model you are going to import to.

You may Create a new empty model (a new empty model) or one that is for the Data Mapping Script import bridge already . Importing creates a new version of the model.

Go to MANAGE > Configuration and select the data mapping script model you plan to import into or create a new model using the Data Mapping Script import bridge.
Go to the Import Setup tab and browse to the location of a directory containing the data mapping scripts (uncompressed) to import and click SAVE.
Click IMPORT.
Go to the Operations widget in the banner and select the import operation to specify Show log.
When completed, stitch the connections and click UPDATE and BUILD.

Example

Sign in as Administrator and create a new model named "Adjustments to Staging import from data mapping scripts". It should use the Data Mapping Script import bridge.

Uncompress the zip file produced earlier for the Adjustments to Staging data mapping export into a folder.

Browse for the folder for the Directory bridge parameter and click SELECT.

Click SAVE and then IMPORT.

Go to the Operations widget in the banner and select the Export data mapping... to Show log.

Note it is completed. You may review the log. Then, see the mappings that are imported:

Click the Connections tab (may need to refresh the browser) and stitch the connections to the original source and target models associated with the original data mapping model. Then click UPDATE and BUILD.

OPEN the model and click the Lineage tab.

Go to the Adj table in the Adjustments and trace lineage:

The end to end lineage trace is available as you have stitched it into the architecture.

Export to a Metadata Excel Format (Deprecated)

The data mapper is compatible with the Excel metadata format that is defined and provided as part of the delivery of MetaKarta. In this way, you may export to this format for reporting purposed, or edit and re-import as an ETL model using the Excel Metadata Format file import bridge.

Only query mappings are supported by the Excel format.

Steps

Sign in as a user with at least the Metadata Import / Export capability object role assignment on the Data Mapping you are going to edit.
Navigate to the object page for the data mapping model. Create a new empty model if necessary.
Go to More Actions > Export.
Select the Metadata Excel Format in EXPORT TO and specify a file pathname for the output.
Click EXPORT.

Example

Sign in as Administrator and open the Adjustments to Staging data mapping.

Go to More Actions > Export.

Graphical user interface, application Description automatically
generated

Select the Metadata Excel Format in EXPORT TO and specify "C:\temp\export.xlsx"for the File parameter.

Graphical user interface, text, application Description automatically
generated

Click EXPORT

When the export is completed

Graphical user interface, text, application Description automatically
generated

Open the spreadsheet file.

Graphical user interface, application Description automatically
generated

Again, it is not a good practice to then re-import or update and re-import the spreadsheet into a data mapping. See details here. Instead, the best practice, is to import the spreadsheet as an ETL model using the Excel Metadata Format file import bridge.

Import from a Metadata Excel Format (Deprecated)

The data mapper is compatible with the Excel metadata format that is defined and provided as part of the delivery of MetaKarta. In this way, you may import an existing spreadsheet in this format into a data mapping. Note, this process may fail due to differences in the definition of the source and/or target data stores in the spreadsheet vs. the versions of those models in MetaKarta and associated with the specific mappings.

Best practice, though, is to import the spreadsheet as an ETL model using the Excel Metadata Format file import bridge.

Only query mappings are supported by the Excel format.

Steps to import into a data mapping (generally not a supported practice)

Sign in as a user with at least the Metadata Import / Export capability object role assignment on the Data Mapping you are going to edit.
Navigate to the object page for the data mapping model.
Go to More Actions > Import.
Select the file pathname for the spreadsheet.
Click IMPORT.

Steps to import into model (Best Practice)

Import the spreadsheet as an ETL model using the Excel Metadata Format file import bridge just as for any other model import.
Go to the Connections tab and stitch the data store models which are source and target of the original data mapping.
Click UPDATE AND BUILD to resolve the stitchings.

Example of Best Practice with Spreadsheets

Sign in as Administrator and open the Adjustments to Staging data mapping.

Go to MANAGE > Configuration and Create a new model using the Metadata Format file import bridge:

Graphical user interface, application, Word Description automatically
generated

Graphical user interface, application Description automatically
generated

Graphical user interface, text, application, email Description
automatically generated

Go to the Connections tab and stitch the data store models which are source and target of the original data mapping.

Graphical user interface, text, application Description automatically
generated

Click UPDATE AND BUILD to resolve the stitchings.

And you are done.

Importing as an ETL model instead of trying to import back into a data mapping is not only the best practice, it is also quite advantageous:

You may schedule imports for an imported model, unlike importing into a data mapping which requires the UI
Since you are now using a model and stitching, rather than a data mapping, it will support changes in either the sources or targets without a hitch (support for change in configuration management).

Data Mapping Script Tutorial Sample

The specifics of the syntax for the data mapping script format are explained in a sample file at:

*$MM_HOME*/conf/MIRModelBridgeTemplate/DataMappingScript/DataMappingScriptTutorial.sql

/*

This data mapping script illustrates the use of classic database SQL Data Manipulation Language (SQL DML) syntax for

moving data from hierarchical file data lake to relational database data warehouse through flattening and staging areas.

The tutorial presents mapping capabilities from simple to complex

starting from the relational database end of the data movement and going to the hierarchical files processing beginning.

*/

/*

Below is the simplified syntax of the supported statements.

===== CREATE CONNECTION statement =====

CREATE CONNECTION TYPE ";";

=

"Relational Database"

|"Oracle Database"

|"Spanner Database"

|"Hive Database"

|"BigQuery Database"

|"File System"

|"Google Cloud Storage"

|"Amazon S3"

|"Messaging System"

|"Kafka"

|"NoSQL Database"

|"Cosmos DB"

|"MongoDB"

|"Tableau Extract Data Source";

===== SELECT INTO statement =====

[]

SELECT [AS ] {, [AS ]} INTO ( | ) FROM ( | )

[<join_clause_ansi>][<group_by_clause_ansi>][<where_clause_ansi>][<order_by_clause_ansi>] ";";

=

|

| ;

All the literals must be escaped with the double quotes if they contain special characters or spaces. uses a relaxed literal rule and follow the following pattern [^ \n\r\t;.@"]+

*/

-- define a mapping between tables in the same database

-- define the database connection by specifying its name and type.

-- the connection name should match the database server name to enable automatic connection resolution.

-- the database server name follows the server[:port] format.

CREATE Connection "dw:1521" type "Oracle Database";

-- bulk mapping from t1 table in staging schema to t1 table in warehouse schema

SELECT *

INTO warehouse.t1

FROM staging.t1;

-- column level mapping of a column x, and b and c columns to y using b+c operation

SELECT

a as x,

b+c as y

INTO warehouse.t2

FROM staging.t2;

-- define a mapping between different databases

-- the last defined connection is used in SELECT by default.

-- You can reference a table in another connection using @connection_name syntax

-- define a new database connection

CREATE Connection "eds:1521" type "Oracle Database";

SELECT *

INTO warehouse.t1@"dw:1521"

FROM eds_schema.t1;

-- define a mapping between files and tables

-- the connection name should match the file system model name to enable automatic connection resolution.

-- the file system model name can include the import root directory

-- file system connection of type AWS S3 where the name includes the root bucket

CREATE Connection "us-west-2.amazonaws.com/mybucket" type "Amazon S3";

-- mapping between file and table columns

SELECT

col1 as x,

col2 as y

INTO eds_schema.t1@"eds:1521"

FROM folder/file1.csv;

-- when a file contains hierarchal structure you can use dot separator to depict a data field hierarchy

-- here is a mapping example for for following JSON structure

-- {

-- "attr1": "",

-- "obj1":

-- {

-- "attr21": "",

-- "attr22": ""

-- }

SELECT

f.attr1 as x,

f.obj1.attr22 as y,

(f.obj1.attr22+f.obj1.attr22)/365 as z

INTO eds_schema.t1@"eds:1521"

FROM folder/file.json f;

-- you can specify conditional SQL logic using WHERE, JOIN and GROUP by clauses

SELECT s1.a as x

INTO warehouse.t1@"dw:1521"

FROM staging.t1@"dw:1521" s1,

staging.t2@"dw:1521" s2

WHERE s1.id = s2.id;

-- TBD: Should we requre aliases for all source classifiers? SELECT src.attr1 as x FROM s.t src

-- TBD: What separators should we use for file folder hierarchy? dot, slash, backslash?