Skip to content

Lineage

Once well managed, metadata is then open for detailed analysis, and true business level use cases may be solved. MetaKarta supports full business level lineage and impact analysis down to the classifier (table/entity/dimension) and feature (column/attribute/measure) level.

Generally, there are two types of lineage: Data Flow and Semantic Flow. MetaKarta can allow users to display and analyze both types of lineage.

  • You may invoke a lineage and/or impact trace by going to the Lineage tab or context menu from a classifier (table, file, entity, etc.) or feature (column, field, attribute, etc.) and specifying the Type in the upper left of the lineage display to be DATA FLOW which will present an end-to-end trace across all the models and mappings in your current configuration

  • You may invoke a lineage overview by going to the Lineage tab from the details page for a model, schema, ETL job, BI design, etc., and going to the Lineage tab.

Either use case may be displayed from the model / data store / schema high level perspective of the enterprise architecture, down to the table / file level, and finally all the way down at the column / field level. The level can be selected for the entire data lineage diagram, or individually on selected data store models / schemas, or selected tables / files.

In the Data Lineage Diagram, all columns/fields of a given table/file are presented at once which matches the classic data modeling concepts. Selection of a given column/field allows a user to highlight the data flow to it.

However, in the past, these diagrams can be overly crowded in today's data lake architectures where it is common to find tables/files with over hundred columns/fields. Furthermore, the large number of tables/files involved may generate too many objects in a readable graph, giving rise to possible warning in the user interface.

You now have the option (by default) of using the data flow "interactive" Analysis Diagram, which displays the columns/fields involved in the given data flow trace, not all the columns. The user can then select the columns/fields to be displayed to better present the business use case of that data flow. Then the user can interact within that diagram by selecting columns/fields to display its lineage. Furthermore, the Analysis Diagrams allow you to display conditional labels such as PII or Confidential Sensitivity Level, not only providing more critical information to the user, but also better visualization of the propagation of that information (e.g. PII) through the data flow lineage trace.

The Type in the upper left of the lineage display provides a selection between either DATA FLOW Based upon connection definitions to data stores and physical transformation rules which transform and move the data) or SEMANTIC FLOW based upon the definition and usage type relationships from a term, concept or logical Model to a physical representation. Both data flow and semantic flow may be present in a diagram..

Data flow links are represented as solid black or gray lines and semantic lineage as dashed blue lines. In most cases, diagrams are laid out so that data flow is shown as left (source) to right (destination) and semantic as top to bottom (more abstract or defining to usage).

Lineage Trace

This method of analysis presents either graphical or textual representation of the flow of data through connection definitions to data stores and physical transformation rules which transform and move the data. In order to see data flow lineage, one must

Once the configuration is ready, then you are ready to report on lineage.

In the Data Lineage Diagram, all columns/fields of a given table/file are presented at once which matches the classic data modeling concepts. Selection of a given column/field allows a user to highlight the data flow to it.

End-to-end data flow lineage across models is only available at the classifier (e.g., table) and feature (e.g., column) level. If instead one goes to the object page for a schema or model, as this is not classifier or feature, the lineage tab shows the overview lineage within the scope of that model only.

A data flow lineage trace presents summary lineage as opposed to the data flow overview lineage which presents a step by step transformation lineage.

When you trace impact/lineage of a table or column, you do not see all the transformations. Instead, you see a summary of the whole job (you get a picture much closer to the one for an architecture diagram). But, you are also able to see complete end-to-end lineage (not just confined to one DI or BI model).

Steps

  1. Sign in as a user which has at least the Metadata Viewing capability object role assignment to the configuration and all its contained models.

    Without the Metadata Viewing capability object role assignment to all the configuration's contained model, you will see a dialog indicating that you do not have sufficient privileges.

  2. Find a starting point for lineage by either

    • Navigate to that element’s object page and select the Lineage tab

    • Or, for lists of elements, click the line the element is on and click the appropriate Open Lineage icon

    • Or, right click on the element in a diagram (architecture diagram, lineage diagram or model diagram) and select Open Lineage

  3. From here you may

Both data flow and semantic flow may be present in SEMANTIC FLOW in the diagram.

General Concepts

In additional to data flow vs. semantic, there is another major distinction between a linage trace and a lineage overview.

  • A lineage trace will always have a point of origin, and also has a Type, and can produce a complete end-to-end lineage picture (including any models in the configuration) depending upon the options chosen.

  • A lineage overview is a view of the design level lineage limited to the scope of the model you invoked it on (by clicking on the Lineage tab) and thus is not a complete end-to-end lineage picture, but simply an overview of the model lineage picture.

Many objects in the lineage OVERVIEW are purely design level concepts, e.g., ETL designs, that are not actually a part of the run-time simulation of the actual physical data flow lineage. Thus, there will be no way to represent a lineage trace from them. If you do attempt to invoke a lineage trace from these types of object, you will receive an informational dialog informing you that the object cannot be displayed as part of a lineage trace.

You may invoke a lineage trace by going to the Lineage tab.

  • If starting at an object that is valid for a lineage trace (generally all data store classifiers and features), you will be presented with a Lineage trace with an origin at that object

  • If starting from an object which is at the entire model level, or schema level, or a purely design level concepts, e.g., ETL designs, that are not actually a part of the run-time simulation of the actual physical data flow lineage, you will be presented with a Lineage Overview, scoped only to that model.

Lineage Flow Type

The Type in the upper left of the lineage display provides a selection between either:

  • Lineage Trace

    • DATA FLOW - Based upon connection definitions to data stores and physical transformation rules which transform and move the data)

    • SEMANTIC FLOW - Based upon the definition and usage type relationships from a term, concept or logical Model to a physical representation.

You may save a lineage graph to be shared and referred to later. This reduces the time required to read from the database and regenerate a lineage graph for larger diagrams.

You may also pick a saved lineage diagram, which someone may send to you in order to see the same presentation they did but not re-calculated, and thus rendered very fast:

A screenshot of a computer AI-generated content may be
incorrect.

These are saved system wide and thus anyone can see them.

Lineage Overview for Models

Data Integration and ETL/ETL data processes contain lineage within the model, even without stitching them to other models. In addition, data store models such as databases with views and/or stored procedures also present lineage in this fashion.

You may even use the lineage overview on models not in the current configuration using the MANAGE > Repository function.

The Data Flow Overview lineage (see TYPE in the upper right of diagram) has a very limited scope, only to the specific subset that is a self-contained model, e.g., a schema model in a database model or a transformation model or connection model in an ETL/DI model or BI model. Thus, it has none of the information determined in an Impact or Lineage trace diagram, so that connections are not resolved (may just be a * because of "Select *" in the connection definition). To get a true lineage picture you must use the Impact or Lineage trace from a table or column (field).

In addition, in the Overview lineage of a connection model in an ETL/DI model or BI model, many of the object are not even included in an end-to-end lineage trace and thus you will not be able to trace from that object at all. This limitation is caused by the fact that connection definitions themselves only show in the overview and are not a part of a lineage trace and thus there is no way to trace lineage from them.

The data flow overview lineage presents detailed transformation lineage vs. a lineage trace which presents summary lineage.

In particular, when you select a runtime job, and go to the lineage tab, you see the detailed transformation lineage: every transformation is being depicted on the screen. This view is good as long as you only look at one job at a time.

Finally, the Data Flow presentation for overview lineage does not offer the Tree tab as the scope is only the currently model and the Tree tab features end-to-end lineage, which is not available for overview lineage presentations.

Steps

  1. Sign in as a user which has at least the Metadata Viewing or Data Management capability object role assignment to the model to be analyzed for Overview lineage.

  2. Open the object page of the model (e.g., ETL/DI or BI model)

  3. Go to the Lineage tab.

Overview lineage -- This is not a Type, per se. Instead, it describes another way to view lineage as a whole, where one scopes down to a particular model. Based upon a view of the design level lineage limited to the scope of the model you invoked it on (by clicking on the Lineage tab) and thus is not a complete end-to-end lineage picture (with the exception of the EXTERNAL DATA FLOW option), but simply an overview of the model lineage picture.

The Type may be:

  • INTERNAL DATA FLOW -- Design level data flow internal to the model

  • EXTERNAL DATA FLOW -- End-to-end data flow including other models which are stitched to lineage objects in the current model

  • TRANSFORMATION DATA FLOW -- Internal run time data flow which includes intermediate transformations as objects in the lineage

  • SUMMARY DATA FLOW - Internal run time data flow which DOES NOT include intermediate transformations and other objects in the lineage.

You are automatically given the Overview lineage when you start by opening a model and going to the Lineage tab at the model or schema level (not for a specific table or column). In addition, only dataflow options are presented (not semantic lineage). In this case, the options above are presented.

Example

Sign in as Administrator and go to the object page for the Staging to Dimensional Talend DI model and go to the Lineage tab.

A screenshot of a computer AI-generated content may be
incorrect.

Right-click on the top DI process name ShippingPOC and select Open.

A screenshot of a computer AI-generated content may be
incorrect.

Lineage Direction

Generally, lineage is represented as a "flow", either of data as part of a data movement and possibly transformation process, or of "meaning" as in from a defining object like a glossary term to a defined object like a column. These directions are commonly also associated with analysis of the lineage, hence:

  • DATA FLOW lineage

    • Forward or destinations or target or impact lineage of the data movement and transformation processes. Represented as being to the right of the point of origin.

    • Reverse or source lineage of the data movement and transformation processes. Represented as being to the left of the point of origin.

  • SEMANTIC FLOW lineage

    • Forward or target or usage or defined lineage of the application of meaning or documentation or inheritance. Represented as being below (and many times to the right of) the point of origin.

    • Reverse or source or origin or definition lineage of the application of meaning or documentation or inheritance. Represented as being below (and many times to the right of) the point of origin.

Direction only makes sense for a lineage trace, not a lineage overview, as the point of origin only exists in a lineage trace.

Control Flow

Generally, lineage is represented as a "flow", either of data as part of a data movement and possibly transformation process, or of "meaning" as in from a defining object like a glossary term to a defined object like a column. These directions are commonly also associated with analysis of the lineage, hence:

Control flow is lineage that traces from an object used as part of a selection WHERE clause or similar structure that impacts what data is moved but is not itself directly moved to the target. There are two types of control flow:

  • Column control flow where the control flow directly impacts values of column (e.g., lookup)

  • Row control flow where the control flow does not directly impact values of columns (e.g., filters).

It is easy to imagine a common scenario where you trace data impact and your impact trace affects a commonly used (in terms of joins and WHERE clauses) dimension, e.g., the time dimension in the warehouse, mart or otherwise. Just about every report will be using that dimension in some way, and thus the impact lineage is basically everything. In this case the diagram size quickly grows out of the capability of your browser to present the lineage let alone navigate and analyze it.

For this and other similar reasons, the same menu as above includes options to limit the lineage.

MetaKarta may be used as an active data catalog, providing:

Control Lineage Option Description Delay in Presentation
None No control flow data impacts are traced None
Limited Show only immediate (adjacent) control flow objects Maybe slow
Complete All control flow impacts are traced Likely slow

Steps

  1. Begin a lineage trace.

  2. In Control Flow, you may:

    • Click None to hide any object which are only connected via control flow and not show any control flow links.

    • Click Limited to show any objects which are directly connected to the origin object via control flow and show those control flow links.

    • Click Complete to show any objects which are connected via control flow to the origin object and any subsequent objects and show those control flow links.

  3. If Limited control flow display is enabled, then go to the lineage Diagram and click on target elements and the control flow that the target depends upon will appear.

Example

Search for the OnPrem DW.dbo.Customer table and open it.

Go to the Lineage tab and ensure that the Type is DATA FLOW and the View is DIAGRAM.

There is a red "pin" in the diagram, showing the point of origin, from which lineage is presented. In this case, the Customer table.

Finally, ensure that the Control Flow is NONE:

Click Columns HIDE and select the top checkbox to show all the columns in the Customer table.

Then, expand the Staging DW model to the column table level using the minus sign.

At this time, the diagram does not contain any control lineage artifacts, as we specified.

Now, specify Control Flow as Limited:

And expand the Staging DW model again to the column level.

Many new objects, which are not directly connected by data flow links now appear. Selecting Data Flow Settings >Control Flow > Limited shows any control flow related objects which are directly connected to the origin object via control flow.

And we see control lineage as different (dashed) lines.

One must click on an object to see the control lineage.

Click the Dimensional DW.dbo.Customer.ID column.

And we see control lineage source columns in gray shading.

Now, specify Control Flow as COMPLETE, and expand the Accounting model to the column level and Click the Dimensional DW.dbo.Customer.ID column.

Even more objects are now shown in the lineage diagram and we also have gray shading when a column that is impacted by control lineage is selected.

Processes (Bottom) Panel with Control Flow

In addition, one may use the Process (Bottom) Panel to see the specific control flow where clauses, lookups and filters.

So, we return to the diagram, but this time we show the Processes Panel at the bottom of the diagram:

Here we have selected two processes, one that has a filter and another which has a where clause. These details are already provided in the Processes Panel without resorting to showing the actual control flow lineage.

If we click on Show lineage details for the CustomerPOInvoiceItem process:

We have:

We see our object's portion of the trace in the Informatica workflow where it is populated.

Expanding the FILT_Customer filter, we see:

We see the actual filter condition Expression that controls which rows in the table are used. PaymentType, InvoiceStatus and PurchaseOrderAmout are all a part of the control flow in this case.

If we now return to the Customer.ID output lineage

If we expand the CustomerPaymentDate table:

As we saw before, the data flow lineage is from Customer.ID to CustomerPaymentDate.ID. In addition, we see the where clause showing the control operation "CustomerName = 'Adjustment'. I.e., it only writes to this field of transactions that are truly general ledger adjustments to correct errors.

If we then click on CustomerPaymentDate.ID:

we see it all put together:

  • Customer.ID column

    • is shaded in blue and thus data flow

    • The Expression for it is a passthrough, so data flow inference of names and definitions occurs

  • CustomerPaymentDate.ID column

    • Is shaded in grey and thus it controls the movement of ID to ID

    • The where clause is there defining the control condition

Columns

A screenshot of a computer AI-generated content may be
incorrect.

You may show or hide all columns or show specific ones.

Lineage Depth

Pick the Depth in the pull-down in the upper right.

  • 1 (Adjacent) step in the lineage. Objects in the lineage that are the next items in a lineage trace.

For impact, adjacent can often be the data store (like a warehouse) that is the target of an object being loaded by DI/ETL that is the focus of the lineage. For course lineage, it can often mean the data source directly loaded from to produce the object that is the focus of the lineage.

  • 2 thru 9 steps in the lineage

  • Any type for both data impact and lineage.

Lineage Filter

The ability to present a manageable amount of information targeted for analysis is a critical concern with lineage diagrams. In particular, for larger diagrams (such as those originating in a central fact in a warehouse, or a commonly used time based dimension), filtering is crucial if you do not want spaghetti diagrams, memory faults (remember, the actual lineage diagram must be presented by your local browser and its memory limitations), or simply huge wait times for the diagram to appear.

You have several filters from which several choices are available

Each filter option shows a number adjacent with the number of objects that would be filtered out of the diagram if that filter is enabled.

  • SHOW INTERNAL OBJECTS to show any intermediate schemas/tables/columns between connections in the lineage, such as transformations in an ETL pipeline

  • SHOW EXTERNAL OBJECTS to show any external source tables or files which an object in the lineage from which the object is derived, such as data lake files from which HIVE derives tables

  • SHOW TEMPORARY OBJECTS to show intermediate temporary tables/columns in the lineage such as temporary data store objects which are created and then deleted as part of the data movement process

  • SHOW EXTERNAL TABLE LOCATION OBJECTS to include objects which are only external table locations that require connection resolution.

  • EXCLUDE MODEL TYPES to not show specific types of models in the lineage

  • EXCLUDE MODELS to not show specifically selected models.

Please see the discussion on handling large diagrams.

Lineage Filter Options

One may include or filter out various object types in order to focus only on specific types of objects in the lineage.

A screenshot of a computer Description automatically
generated

Click Edit Filters and specify:

  • SHOW TEMPORARY OBJECTS to show intermediate temporary tables/columns in the lineage

  • SHOW INTERNAL OBJECTS to show any intermediate schemas/tables/columns between connections in the lineage

  • SHOW EXTERNAL OBJECTS to show any external source tables or files which an object in the lineage from which the object is derived

  • SHOW EXTERNAL TABLE LOCATION OBJECTS to include objects which are only external table locations that require connection resolution.

  • EXCLUDE MODEL TYPES to not show specific types of models in the lineage

  • EXCLUDE MODELS to not show specifically selected models.

In some cases you may see that a lineage diagram is taking an excessive amount of time to display or that you are presented with the message:

This large diagram has xxxxx objects and xxxxx links which may require more resources that what your browse case handle.

You may use the PROCEED ANYWAY button to try to visualize the diagram.

You may also save these settings as defaults in future lineage traces.

Steps

  1. Begin a lineage trace.

  2. Click Edit Filters and specify:

    • SHOW TEMPORARY OBJECTS to show intermediate temporary tables/columns in the lineage

    • SHOW INTERNAL OBJECTS to show any intermediate schemas/tables/columns between connections in the lineage

    • SHOW EXTERNAL OBJECTS to show any external source tables or files which an object in the lineage from which the object is derived

    • SHOW EXTERNAL TABLE LOCATION OBJECTS to include objects which are only external table locations that require connection resolution.

    • EXCLUDE MODEL TYPES to not show specific types of models in the lineage

    • EXCLUDE MODELS to not show specifically selected models.

Show Internal/External Objects

Lineage reporting may

  • either Show Internal Objects within a model (e.g., interim steps in transformations) or just the objects stitched to other model objects.

  • either Show External Objects that are not directly material to the lineage trace (such as the link from files in HDSF to the tables representing them in Hive) or not show these objects.

Show Temporary Objects

Big data solutions and other ETL/DI processes use temporary files and tables routinely. When harvesting, MetaKarta detects temporary files and marks them as TEMPORARY in their lineage characteristics. This fact means that you can distinguish temporary objects from permanent/stitchable ones in a lineage diagram and, optionally hide/show them.

Show External Table Location Objects

Models may refer to external tables that require connection resolution. By default, these table location objects are not shown. You may use this option to explicitly show them.

Default View

This option allows you to save the current filter setting to be the default for future trace reports.

Lineage Type (Display As)

Use the Lineage Trace Header Options including specify the lineage presentation

Saving Lineage Results

You may save a lineage graph to be shared and referred to later. This reduces the time required to read from the database and regenerate a lineage graph for larger diagrams.

Anyone may then view the saved lineage diagram y going to the Type in a lineage trace or overview and picking that named saved lineage:

These are saved system wide and thus anyone can see them.

Data Flow Settings

  • AUTO REFRESH - Based upon connection definitions to data stores and physical transformation rules which transform and move the data)

  • HIDE LOOPS -- Display individual tables without database grouping nodes that involve loops to clarify cross-database lineage paths. Ungroup tables from their databased to simplify lineage flow visualization and reduce loop confusion.

  • SHOW GRAPH COUNTS -- Display of counts of nodes (boxes) and links (lines)

  • TROUBLESHOOTING -- Used when reporting an issue.
Lineage Process (Bottom) Panel

Along with the pictorial graph of the lineage, you may also analyze the transformation or operations acting on the columns and tables in the lineage. This information is generally presented in the Processes Panel at the bottom of the lineage page.

Steps

  1. Begin a lineage trace.

  2. Ensure that the Process Panel, at the bottom of the page is expanded.

  3. Click on a column to see:

    • Processes that make <> to show the operations which lead to or are upstream (source) in the lineage from the selected column.

    • Processes that use <> to show the operations which use the selected column as a source for downstream lineage.

  4. If Limited control flow display is enabled, then go to the lineage Diagram and click on target elements and the control flow that the target depends upon will appear.

You can see

  • The Process that populated Customer.ID

  • Its Context

  • The Data Operations within that Process which contribute to Customer.ID presented as a summary operation (not individual steps or transformations)

  • Any Control Operation showing control lineage that affects Customer.ID

  • An icon to Show Lineage Details for the complete Process.

Example

In the following example diagram we have the Staging DW.MITI-Finance-Staging-DW.dbo.Customer table selected and the Lineage tab selected.

The bottom of the DIAGRAM lineage display shows two columns:

  • Processes that make <> - Processes that write to (make) the selected classifier

  • Processes that use <> - Processes that read from (use) the selected classifier

Based upon <> which is the currently selected classifier (e.g., table), (as opposed to the point of origin for the trace which is noted by the red pin).

Thus, the left panel shows:

  • A Data Mapping that writes to columns in the Customer table:

  • A SQL Script in a Databricks notebook that again writes to columns in the Customer table:

We even see the operation.

On the right column we have where the Customer table is used in the lineage flow:

  • A Spark Script in a Databricks notebook that reads from columns in the Customer table:

You may click on the Show lineage details icon to see detail of the Databricks notebook script:

This is no longer a lineage trace, but a lineage overview of the particular ETL/DI model that you wanted to Show lineage details for.

Clicking on any object in the flow and it will highlight that subset in the diagram and show the Processes that make and Processes that use for that selected object:

In addition, you may also see the actual code of the script using the SCRIPTS button:

Then, clicking on a step in the diagram, the matching script text is highlighted:

And vice-versa, when text is selected, the corresponding diagram object is selected:

You may return to the non script presentation by clicking PROCESSES:

The middle column in the Process (bottom) panel presents the contents (columns) of the selected object. As this diagram is not the original lineage trace, but a lineage overview of the particular ETL/DI model that you wanted to Show lineage details for, we see columns and thus the middle part of the Process (bottom) panel.

Returning back to the original lineage trace and click on Columns and selecting all of them:

We now see columns in the diagram and see the middle region in the Process (bottom) panel, because we have now downloaded and rendered the column information as part of the lineage:

You may also simply Open a Process using that icon:

In this case it is a query mapping as part of a data mapping that populates the Customer table.

Processes (Bottom) Panel with Control Flow

You may see details about control flow using the Processes Panel at the bottom of the lineage diagram in that section.

Display Lineage As

The equivalent lineage graph is maintained by the UI while you maintain the same point of origin and Data Flow Settings. In this way, you may toggle between three different presentations:

Lineage Tree

The Tree Presentation provides a textual and hierarchical tree presentation, one for each direction (e.g., source and destination). The scope of that list is based upon the choice of direction of the trace which are impact (forward) or lineage(sources) or the business intelligence (BI) reports, as well as the proximity in the trace. It is generally the most efficient to render in the UI.

Please see the discussion on handling large diagrams.

Select the Tree tab on the left to obtain this presentation.

Next to SHOW, you will see two options, Objects or Processes:

  • Objects data store object types, e.g., tables, columns, views, fields, files, etc.

  • Processes data movement and possibly transformation processes, e.g. mappings, transformations, computation, select/inserts, etc.

The scope of that list is based upon the choice of direction of the trace which are impact (destinations) or lineage (sources), as well as the proximity in the trace:

  • Adjacent objects/processes in the lineage which are the next items in a lineage trace. For impact, that can often be the data store (like a warehouse) that is the target of an object being loaded by DI/ETL that is the focus of the lineage. For course lineage, it can often mean the data source directly loaded from to produce the object that is the focus of the lineage.

  • Ultimate end objects/processes are the final nodes in the lineage where the trace stops. For impact, this often means report fields, for source lineage it often means operational system tables and columns.

  • All objects/projects in the lineage which are part of the business intelligence type reports generally at the far end of the lineage trace.

Steps

  1. Trace data flow lineage.

  2. Click the TREE next to Display As on the right.

  3. From here you may

    • Pick the options next to SHOW in the upper left, as defined above.

    • Click the Download icon to download the entire textual results to CSV format.

    • Expand the details panel to see an equivalent of the Overview tab for the object page of a selected object or process.

Example

Search for the DW Staging.Customer.CustomerName column, go to the object page and then the Lineage tab. Click Tree in Display As: at the upper right. Click Adjacent or Ultimate or All next to SHOW.

The Lineage (Sources) panel shows the Customer table in the Accounting.MITI-Finance-AR datastore along with the two files in the Data Lake, which together comprise the ultimate sources for this Customer table in Staging DW.

The Impact (Destinations) panel shows the ultimate reports using data from the Customer table.

Click All.

Now click Adjacent:

The Lineage (Adjacent Sources) panel still shows the Customer table in the Accounts Receivable model as it was not only the ultimate source for this table in Staging DW, but also was the adjacent one.

The Impact (Adjacent Destinations) panel shows the tables in the OnPrem DW data store, instead of going to the ultimate destination, which were the reports.

Now, click Diagram in Display As on the right to see the full picture of the lineage.

Now, one can see that why the similar results on the Lineage (Sources) panel as there is really only one step (adjacent) to the ultimate sources.

This example is a fairly simple demo. One can imagine the value of using the Tree tab for more realistic (and then much more complex) lineage examples from real environments.

Return to the Tree tab and click Ultimate.

Expand the Details panel on the far right and select the Finance1 app in the Cloud DW Qlik Sense model.

Now we see a representation of the contents of the Overview tab of the object page, but presented as a panel in the TREE lineage display.

You may now click on the Open in Tool as in the examples with BI tools in the side lineage panel, e.g., as shown further in the user guide.

Lineage Diagram

The data flow "interactive" Diagram displays the columns/fields involved in the given data flow trace, not all the columns. The user can then select the columns/fields to be displayed to better present the business use case of that data flow. Then the user can interact within that diagram by selecting columns/fields to display its lineage. Furthermore, the Diagram allows you to display conditional labels such as PII or Confidential Sensitivity Level, not only providing more critical information to the user, but also better visualization of the propagation of that information (e.g. PII) through the data flow lineage trace.

Please refer to the Lineage Diagram visualization common features.

In order to see data flow lineage, one must

Once the configuration is ready, then you are ready to report on lineage.

End-to-end data flow lineage across models is only available at the classifier (e.g., table) and feature (e.g., column) level. If instead, one goes to the object page for a schema or model, , as this is not classifier or feature, the lineage tab shows the overview lineage within the scope of that model only.

Steps

  1. Trace data flow lineage.

  2. Click the Analysis Diagram tab on the left side.

  3. From here you may

    • Pick the Direction in the pull-down in the header of the diagram:

      • Impact (Destination) direction

      • Lineage (Sources) direction

      • Any type for both data impact and lineage.

    • Select which columns to display in the diagram using the Columns pull-down in the header of the diagram.

      • A list of possible columns with a quick find is presented with checkboxes.

      • Pick the Depth in the pull-down in the upper right.

      • 1 (Adjacent) step in the lineage. Objects in the lineage that are the next items in a lineage trace.

      For impact, adjacent can often be the data store (like a warehouse) that is the target of an object being loaded by DI/ETL that is the focus of the lineage. For course lineage, it can often mean the data source directly loaded from to produce the object that is the focus of the lineage.

      • 2 thru 9 steps in the lineage

      • Any type for both data impact and lineage.

    • Click the Show actions for the selected object icon (see Lineage Diagram Visualization Common Features)

    • Click Save an image to produce a downloadable file with a lineage image.

    • Click Filters and specify lineage filter options.

    • Click Display Options and specify lineage display options.

Example

Navigate to the object page for the Customer table in the Staging DW.dbo schema.

Go to the Lineage tab and click the Diagram tab on the left side.

Pick ANY for the Direction in the pull-down in the diagram header.

A screenshot of a computer Description automatically
generated

The red colored pin indicating the source of the lineage and impact trace.

The diagram defaults to the classifier (table) level for performance reasons.

Click the Show actions for the selected object icon and select Show Columns.

A screenshot of a computer Description automatically
generated

Now columns are visible, but still not the column lines. Again, this is for reasons of performance and simplicity of presentation.

Click on the Display Options icon and click Show Conditional Labels

A screenshot of a computer Description automatically
generated

A screenshot of a computer Description automatically
generated

Here, you may pick and choose conditional labels to show in the diagram and the image shows all of them selected for display.

A screenshot of a computer Description automatically
generated

Click on the Display Options icon and select Show Term Definitions

A screenshot of a computer Description automatically
generated

A screenshot of a computer Description automatically
generated

Terms, like US Social Security Number (documenting the ID field), are used to document columns and tables that are in this lineage trace and this is shown in the diagram.

Lineage Diagram Visualization Common Features

There are a number of common features and tools available when visualizing a lineage trace, data model, etc.

Lineage Diagram Show Overview

You may click this Show overview icon to show or hide an Overview panel of the model diagram. Click in the overview to quickly move to a portion of the full diagram.

Lineage Diagram Zoom In/Out and Fit to content

Click Zoom in () or Zoom out () icons to adjust the aspect ratio of the diagram. Also, you may click on the Fit to content ()icon to view the entire diagram at the best zoom that will fit.

Lineage Diagram Collapse / Expand

Click Expand / Collapse to expand or collapse the entire diagram (ensure that you do not have an object selected, otherwise the action will only apply to that object).

You may also click on the plus sign for an object to expand and the minus sign to collapse just that object.

Lineage Diagram Show actions for the selected object

Show the actions available in the context menu for the selected object.

A screenshot of a computer Description automatically
generated

  • Open to go to the object page for the object

  • Open Lineage to change the point of origin (red pin in diagram) to present a new lineage display.

Lineage Diagram Trace in General

Select the Analysis Diagram tab on the left to obtain this presentation. You will see a graphical presentation of the lineage (data impact or data source).

Additional options include:

Zoom In/Out and Fit to content

Click Zoom in or Zoom out icons to adjust the aspect ratio of the diagram. Also, you may click on the Fit to content icon to view the entire diagram at the best zoom that will fit.

Collapse / Expand

Click Expand / Collapse to expand or collapse the entire diagram (ensure that you do not have an object selected, otherwise the action will only apply to that object).

You may also click on the plus sign for an object to expand and the minus sign to collapse just that object.

Open the object page

You may right-click and select Open (),to navigate to the object page.

Print

You may download a PNG or SVG image of the diagram.

Quick find

In the upper right, there is a search text box that will provide a quick list of object names that contain the text you type. You may click on any of the results to select that object in the diagram and moving the focus there.

Explore Further

Invoking a lineage trace from any reference to an object

You may invoke a lineage trace from any diagram or any list of results (e.g., from a Browse or Search), either via right-click context menu

A close up of a sign Description automatically
generated

Details Panel

Click to select a object and view its Overview page properties in the Details Panel on the right. You may show and hide this panel as needed.

Lineage Diagram Display Options

The Display Options are available.

  • Show Term Definitions -- Show semantic lineage back to glossary terms for those which provide definitions of objects in the data flow lineage

  • Compact View -- Toggles between small boxes with icons and wider boxes with more text and detail.

  • Show Conditional Labels -- Display conditional labels in the diagram as selected in Edit Conditional Labels.

  • Edit Conditional Labels -- Pick the conditional labels to display

  • MAXIMUM NODE WIDTH -- set the size of the object boxes.

Lineage Diagram Options Show Term Definitions

Here you may see the terms with Defined by relationships in the diagram.

Lineage Diagram Options Compact View
Lineage Diagram Options Show Conditional Labels

Click on the Display Options icon and click Show Conditional Labels

A screenshot of a computer Description automatically
generated

Here, you may pick and choose conditional labels to show in the diagram and the image shows all of them selected for display.

A screenshot of a computer Description automatically
generated

Lineage Diagram Options Maximum Node Width

In many cases, names of objects may be too long to fit into the objects in the diagram. You may specify several different node width maximums to make the diagram more readable. Click on Display Options.

A screenshot of a computer Description automatically
generated

Pick the note width.

Data Flow Settings
AUTO REFRESH

Allows you to control the refresh of the diagram:

  • Yes -- Refresh diagram as soon as a setting is made

  • No -- Shows a Run button to refresh of the diagram.

HIDE LOOPS

Option to hide lineage loops in the Diagram.

SHOW GRAPH COUNTS

Option to show the counts of objects lines in the Diagram.

TROUBLESHOOTING

Generally, when reporting a support ticket for a lineage diagram issue, you may use these options to provide details and an export to reproduce.

Show Logs

Use the SAVE LOG button to save as text (so it is searchable).

Show Graph Statistics

Export Graph

You may export a graph to be reproduced when reporting an issue. It is exported as a JSON file.

Import Graph

You may import and exported graph.

Classic Diagram

The Classic Data Lineage Diagram can be overly crowded in today's data lake architectures where it is common to find tables/files with over hundred columns/fields. Furthermore, the large number of tables/files involved may generate too many objects in a readable graph, giving rise to possible warning in the user interface.

Please refer to the diagram visualization common features.

In addition to those general features, additionally there features specific to the classic diagram presentation.

This method of analysis presents a graphical representation of the flow of data through connection definitions to data stores and physical transformation rules which transform and move the data. To see data flow lineage, one must

Once the configuration is ready, then you are ready to report on lineage.

End-to-end data flow lineage across models is only available at the classifier (e.g., table) and feature (e.g., column) level. If instead, one goes to the object page for a schema or model, , as this is not classifier or feature, the lineage tab shows the overview lineage within the scope of that model only.

This is an older methodology for presenting a lineage trace. You are highly encouraged to us the newer Data flow Diagram method as the Classic diagram does not scale well with larger diagrams and number of objects.

You may disable this feature in the UI by setting the Show Lineage Classic Diagram in group preference to false for the group Everyone.

Data Lineage (sources)

These are the analysis type use cases, generally posed as questions such as:

  • Given an item on a report, what data entry system fields impact these results?

  • Why are the numbers on this report the way that they are?

  • How to change the system data to get the correct results for this report?

This type of analysis, i.e., asking where the information comes from, is a question posed "upstream" in the dataflow. We refer to it as a reverse lineage question. When consumers of these reports ask these questions, a correct and responsive answer may be the most valuable information provided by a metadata management environment.

Steps

  1. Trace data flow lineage.

  2. Click the Diagram tab on the left.

  3. From here you may

    • Pick the Type in the pull-down in the upper right.

      • Data Impact type

      • Data Lineage type

      • Full Data Lineage type for both data impact and lineage.

    • Click the More Options icon and

      • select Show/Hide Columns to show columns in the selected object, or all objects if none is selected

      • select Expand/Collapse All to expand down to the current display level (columns or tables) or collapse to the highest level.

    • Click Save an image to produce a downloadable file with a lineage image

    • Click Edit Filters and specify lineage filter options.

    • Click Display Options and specify lineage display options.

Example

Search for the Net Vendor Customer Invoices Tableau worksheet and open it.

Go to the Lineage tab.

Click Columns: and pick all the columns.

Click the plus sign (Expand all) and click To Column Level:

The different lineage indicate different types of data flow processes

Click the minus sign next to MITI-Finance-AP.dbo (Database) in Accounting (Model) to collapse.

Click the minus sign next to Invoice (Table) in MITI-Finance-AP.dbo (Database) to collapse.

Expand those back and you then see the exact column that is a source in the lineage trace.

Click in an empty space in the diagram to de-select Invoice, then click the minus sign (Collapse Selected Node Completely) collapse all, which will now apply to all objects.

We have a very clean presentation.

Click in an empty space in the diagram, then click the plus sign (Expand the Selected Node to the desired level) collapse all, which will now apply to all objects.

Pick Column Level:

Select a column, then click Highlight to outline the paths through that object.

Click Staging DW.MITI-FinanceStaging-DW.dbo.Customer.CustomerName:

Just as with the Diagram (not the Classic Diagram that we are looking at now, the Process (bottom) Panel show the full processes that "make" the column and that "use" the column.

And you see the transformation at the bottom of the page.

See Process Panel examples.

Data Impact

Many times, one may ask these forward lineage or impact analysis type of questions:

  • If I make a change to this field, what reports will be impacted?

  • How is this identity information merged with the personnel system information on these other reports?

A data flow impact report traces the manner in which data flows from source to destination.

Steps

  1. Pick CLASSIC DIAGRAM from Display As.

  2. Click Data Impact in the Type pull-down in the upper right.

You may now use the Visualization Common Features and the Display Options.

Example

Navigate to the object page for the file PAYTRANS.csv (a search string must be enclosed in quotation marks as the period (.) has special meaning in the search syntax, e.g. "PAYTRANS.csv") and the semantic search must be disabled.

A screenshot of a computer Description automatically
generated

Then click the Lineage tab and select Classic Diagram in Display As. Click Type DATA FLOW and Direction IMPACT

.

Full Data Lineage

This option provides the combination of both:

  • Data Lineage (trace from an object upstream to objects that provide data flow to that object)

  • Data Impact (trace from an object downstream to objects that are impacted via data flow by that object)

Based upon all the lineage flows that trace though the selected object (feature or classifier).

Steps

  1. Pick CLASSIC DIAGRAM from Display As.

  2. Click Data Flow in the Type pull-down in the upper left.

  3. Click ANY in the Direction pull-down.

You may now use the Visualization Common Features and the Display Options.

Example

Search for "Customer" and pick the table OnPrem DW > dbo > Customer.

Go to the Lineage tab.

The Lineage tab has double arrows next to it, indicating that there are both impact and lineage traces for this object.

You have all the lineage traces going through that object. The object from which the lineage is determined is marked with a red pin.

Classic Diagram Visualization Common Features

There are a number of common features and tools available when visualizing a lineage trace, data model, etc.

Classic Diagram Show Overview

You may click this Show overview icon to show or hide an Overview panel of the model diagram. Click in the overview to quickly move to a portion of the full diagram.

Classic Diagram Zoom In/Out and Fit to content

Click Zoom in () or Zoom out () icons to adjust the aspect ratio of the diagram. Also, you may click on the Fit to content ()icon to view the entire diagram at the best zoom that will fit.

Classic Diagram Collapse / Expand

Click Expand / Collapse to expand or collapse the entire diagram (ensure that you do not have an object selected, otherwise the action will only apply to that object).

You may also click on the plus sign for an object to expand and the minus sign to collapse just that object.

Show actions for the selected object

Show the actions available in the context menu for the selected object.

  • Open to go to the object page for the object

  • Open Lineage to change the point of origin (red pin in diagram) to present a new lineage display.

  • Focus on Path to show lineage between the selected object and the point of origin (red pin in diagram) to present a new lineage display.

  • Highlight path to highlight the lineage flow from the selected objects.

  • Summarize this Object to remove the selected objects as summarized into lineage lines and present a new lineage display

  • Show Debug Information to provide troubleshooting information for reporting an issue.

Classic Diagram Trace in General

Select the Analysis Diagram tab on the left to obtain this presentation. You will see a graphical presentation of the lineage (data impact or data source).

Open the object page

You may right-click and select Open (),to navigate to the object page.

Print

You may download a PNG or SVG image of the diagram.

Quick find

In the upper right, there is a search text box that will provide a quick list of object names that contain the text you type. You may click on any of the results to select that object in the diagram and moving the focus there.

Interpreting the graphical lineage

In general, the lineage tools within MetaKarta function identically whether one is analyzing data flow lineage, semantic lineage or both. However, the presentation is different, as follows:

In addition, MetaKarta has four levels of presentation:

  • Configuration Model Connections Overview -- which is a diagram representing the various Models contained within a configuration and how they are related (or stitched) to each other based upon connection definitions manually assigned to MetaKarta.

  • Model Connections Overview -- which is a diagram representing the various Models contained within the directory of an external repository and how they are related (or stitched) to each other based upon connection definitions already provided in the external metadata repository.

  • Model Lineage Overview -- which is a diagram representing an overview of the lineage within a given Model.

  • Lineage Trace analysis at the configuration or Model level -- which is a fully detailed trace of semantic and/or data flow lineage for detailed analysis.

Details Panel

Click to select a object and view its Overview page properties in the Details Panel on the right. You may show and hide this panel as needed.

Classic Diagram Display Options

The Display Options are available.

  • MAXIMUM NODE WIDTH -- set the size of the object boxes.

  • Highlight Control Links -- Include control lineage links in the Highlight operation

Classic Diagram Options Maximum Node Width

In many cases, names of objects may be too long to fit into the objects in the diagram. You may specify several different node width maximums to make the diagram more readable. Click on Display Options.

A screenshot of a computer Description automatically
generated

Pick the note width.

Highlight Path

Click highlight path to highlight the lineage path of the selected object. Double click or long click to enable auto highlight on any selected object.

Lineage Presentation Examples
Stored Procedures

Database models may include stored procedures that move data either within a database or from / to external databases (which also requires connections to be stitched).

In this case, we have a stored procedure reading from the PaymentStatus and writing a note to the CustomerName if PaymentAmount is a negative amount.

We can see this when tracing the lineage from CustomerName in Staging DW.MITI-Finance-Staging-DW.dbo.Customer:

A screenshot of a computer AI-generated content may be
incorrect.

Click the Show Lineage Details icon to the right of the name of the stored procedure:

You may also go to the Overview tab to see the details about the stored procedure itself.

A screenshot of a computer AI-generated content may be
incorrect.

Script Based DI Models

For this scenario, please see the pySpark Databricks Notebook example.

Operations and Summary Lineage

Along with the pictorial graph of the lineage, you may also analyze the transformation or operations acting on the columns and tables in the lineage. This information is generally presented in the Processes Panel at the bottom of the lineage page.

For examples, please see the Processes Panel section.

Semantic Flow

Semantic Definition Lookup

In this use case, one has found a data element (a column in a table in a database for example, or a field in a report) and wants to understand what it means. By defining the semantic links properly, MetaKarta can trace back through the physical data flow (as long as there is no transformation which would change the meaning) to an element that is mapped to a term in the glossary and thus find a useful definition.

The caveat that the above only works "as long as there is no transformation which would change the meaning" implies that some subset of the fields in your reports will not provide a semantic definition. The trace will simply stop at the transformation and never get to a model (again likely the data warehouse) that has semantic lineage.

So, in addition to this method of "trace through the dataflow as long as there is no transformation which would change the meaning", there is another which is search based or name matching based. In this case, if there is a field in a report named "Net Account Amount" and it does not have a good data flow trace without transformation, one could still create a term in the glossary named "Net Account Amount". When requesting a data element definition lookup in that case, MetaKarta will perform a search for that term and report its definition, even without a clean lineage trace. In most case, it will be necessary to fill in the blanks in some of these cases by adding terms to the glossary.

Of course, it is quite possible that no term directly matches the report field by name. In this case, one may define a direct object relationship like a term Is Defined by relationship from a term in the glossary to the field in the report. The advantage of this approach is that one may control precisely what the preferred definition will be versus the name matching method. Also, it provides a definition, even though there may not be a data flow trace that does not contain transformations. Hence, it is the preferred method for fields for which there is no equivalent in the warehouse or lake (i.e., calculated in the report) and there is no term or multiple terms that match by name.

All these types of semantic definitions can be turned on/off in the customized presentation UI semantic usage widget, meaning the users can select what kind of semantic definition they want to see on the Overview page when you have customized it to show the widget, but not what is used for Documentation (Name and Business Definition).

To summarize, there are several methods used to provide an answer to a definition lookup.

The preference for which result is used is based upon a ranking system that is in descending order in the list above. Thus, a DOCUMENTED result gets preference over CLASSIFIED, etc, for the Name and Business Definition.

Example

Navigate to AccountAmountAvailable, which is a column in the GLAccount table in the Dimensional DW in the demo.

The Documentation including Name and Business Definition for the view column is already populated. It was determined based upon a term definition.

Click the pencil icon next to the Definition.

Here you have the Edit Documentation dialog, showing that:

  • There is no Local Documentation defined, thus nothing provided to name and define this object that is applied directly to the object

  • There is one semantic mapping (Mapped Documentation) defined directly to this object from a glossary term which is used to provide a name and definition for the object

  • In addition, pass through lineage and semantic mappings lead to at least one terms which has the same meaning as this object (Inferred Documentation) and has an alternative definition and name.

Click the OPEN SEMANTIC FLOW link to the right of the Inferred Documentation.

Then expand the Enterprise Glossary and Finance Glossary entries to see the terms inside.

Here we see that the definitions are explained well.

  • Again, the first name and definition provided before is based upon the term named Account Amount Available, which is directly semantically mapped to the object in question.

  • Then tracing back up the semantic mapping to a more generalized term named Amount Funded, the alternate definition is provided.

  • Finally, there is a domain type term named Unified Dollar Amount which describes how such an object should be represented.

As you click objects in the diagram, the Process (bottom) Panel shows the processes (semantic mappings) which lead to (make) and are derived from (use) the selected object. E.g., click the Account Amount Available term:

Semantic Usage

In this scenario, one may wish to see the usage of the semantic element (e.g., glossary term) in the architecture.

In this scenario, from a glossary term or conceptual/logical model element one may wish to simply discover what data element are semantically mapped in the data flow architecture and thus would be impacted by a change to the term or model element.

A semantic usage lineage trace is nearly the reverse of semantic definition lookup. In general it is requested from a term's or logical model element's object page. The usage trace itself proceeds down each semantic link and then traces the data flow where there are no transformations (pass-through lineage) to all objects which may be reached in this manner.

Example

Navigate to the term Personally Identifiable Information (PII) in the Finance Glossary. Then click the Lineage tab. Click TREE for the Display As on the right.

On the right hand side panel we see All (note the SHOW setting in the header) and you may see all the terms in the Finance Glossary which are PII, i.e., they are semantically mapped to.

Now, switch to Display As DISPLAY.

We see here at an field named ID in the Data Lake has been classified as a US Social Security Number data class, and thus is shown to be PII.

Semantic Relationship Types

Semantic and data flow lineage traces report a number of elements in the semantic definition lookup and usage reports.

For the inferred results, priority is given to certain types of objects, in this order from highest to lowest:

  • Term

  • Data model (e.g., and erwin or ER/Studio model) object

  • Other objects.

Finally, given multiple of the same results:

  • E.g., three inferred terms - priority is given to a term which is exactly adjacent (directly mapped/classified)

  • E.g., several objects in the data flow with pass-through lineage, priority is given to the object which is directly adjacent in the data flow.

Handling Large Diagrams

As you may imagine, data flow diagrams can become quite large. If you trace starting at some commonly used fact or dimension in the warehouse, and it has thousands of columns, it is auite likely that it would produce a diagram with tens or hundreds of thousands of objects and an equally large number of links or lines. The Diagram presentation is designed to handle exceptionally large diagrams. In fact, much larger than could be considered reasonable and useful, as tens of thousands of objects and links really defy effective analysis, let alone understanding.

The Classic Diagram presentation is an older methodology, and is not optimized for larger diagrams like the Diagram presentation, and thus the Classic Diagram should only be used for smaller.

Each time you view an object and go to the Lineage tab the lineage options that are NOT remembered as a part of User Preferences are reset to their default settings. See table.

Each time you bookmark a URL for an object while viewing the Lineage tab only those lineage options are remembered with the URL. See table.

However, for URLs, the Direction is remembered.

Option Default Remembered in User Preferences Remembered in URL (Bookmark)
Type DATA FLOW Yes
Direction ANY Yes Yes
Control Flow NONE Yes
Columns NO
Depth FULL Yes Yes
Filters NONE Yes
Display As DIAGRAM Yes Yes

Example

When working with potentially large diagrams, you will note there are two phases in presentation process:

  • Retrieving (or Loading) the lineage graph from the database

  • Drawing the lineage diagram

Retrieving (or Loading) the lineage graph from the database

You will see a Loading mask indicator while the process completes.

A screen shot of a loading screen AI-generated content may be
incorrect.

Downloading and Loading the Graphics Engine

A white rectangular object with blue circles AI-generated content may
be incorrect.

The Downloading and Loading the Graphics Engine only occurs once per user session.

Laying out the Diagram

A screen shot of a computer AI-generated content may be
incorrect.

If you wish to see the (business intelligence tool) reports that are related through data flow and/or semantic flow to the current object such as a data column/fields, or a glossary term, you may go to the Related Reports tab. In this case, MetaKarta will use the data flow impact (from a data column/field and/or semantic usage trace (from a glossary term), then identify those objects that are fields on business reports and provide an answer that is the list of those related business reports which are impacted by that object.

Please see steps and examples at Related Reports on the Object Page.

Model Dependency for Multi-Models

BI, DI, and data store models may contain many individual models in a multi-model structure. In many cases it is useful to see the dependencies (stitching) among the models in a multi-model represented graphically

Steps

  1. Sign in as a user which has at least the View Metadata or Data Management capability object role assignment to the configuration and all its contained models.

  2. Navigate to the object page of one of the models in the multi-model.

  3. Go to the Lineage tab and select Model Connections in the TYPE pull-down.

Example

Sign in as the Administrator user,navigate to the object page of the Finance Universe. Go to the Lineage tab and select Model Connections in the TYPE pull-down.