Skip to content

Data Products Overview

Overview

The ability to define, use and maintain data products is a very useful feature within MetaKarta. It is one of the primary features of the larger data mesh architecture approach. A data mesh is generally seen as "a decentralized sociotechnical approach to share, access, and manage analytical data in complex and large-scale environments within or across organizations." It provides a pattern for defining how organizations can organize around one or more data domains with a focus on delivering enterprise data products. In this way, it supports data producers and consumers and provides federated governance through lightweight centralized policy.

The data mesh architecture, first proposed by Zamak Deghani, presents a modern data architecture that moves away from a monolithic data lake or data warehouse architecture to a distributed, domain-specific architecture that

  • Enables autonomy of data ownership
  • Provides agility with decentralized domain-aware data management
  • Allows for centrally governing and monitoring data across domains.

Any data mesh implementation endeavors to achieve the promise of scale while delivering quality and integrity guarantees needed to make data usable:

  • Ensure domain-oriented decentralized data ownership and architecture
  • Provide data as a product
  • Deliver self-serve data infrastructure as a platform
  • Enable federated computational governance.

Data Products in the Data Mesh

Depending on the nature of the data within a domain (known as domain data) and its consumption models, data can be served as events, batch files, JDBC relational tables, graphs, etc., while maintaining the same semantics (e.g, meaning and utilization). For data to be usable, it must have metadata, including:

  • Semantic and syntax declaration
  • Data computational documentation
  • Quality metrics
  • Traits used by computational governance to implement the expected behavior (e.g., access control policies).

Code

  • for data pipelines responsible for consuming, transforming, and serving upstream data (data received from the domain’s operational system or an upstream data product)
  • for APIs that provide access to data, semantic and syntax schema, observability metrics, and other metadata
  • for enforcing traits such as access control policies, compliance, provenance, etc.

Data Products Model

You must first create a data products model as a custom model in order to store and manage the data products, data domains, ports, usage requests, etc.

In this way,

  • A data products model may contain one or more data domains, and these data domains have a hierarchical structure with a particular data products model.
  • Each data model then may contain
  • One or more data products. Each data product may contain
    • One or more ports, which are data sources. Each port may be associated with any object in the repository and will therefore be represented by that object's metadata (e.g., table and column specifications, file format, etc.)
    • One or more usage requests
  • One or more data contracts, each of which may be associated with up to one data product