Semantic Table Overview

Semantic tables provide business-friendly views of raw database tables and views. They host semantic columns, metrics, and hierarchies. A model can employ a semantic table as a fact or dimension.

A semantic table can optionally define joins to represent a snowflake dimension (e.g., nation and region) or a dependent fact (e.g., lineitem and order).

Semantic Table defines:

Source database table
Unique Keys for the table to be used in Joins
Joins to relate the table with other tables (e.g., dependent multi-fact or snowflake dimension).
Semantic Columns to select database columns that the semantic table can use and define calculation expressions that metrics and levels utilize.
Levels as columns in a database that represent a different granularity of detail within a dimension semantic table.
Hierarchies to organize data from the highest level of summarization (the "all" or root level) down to the most granular level (the "leaf" level) within a dimension semantic table
Metrics to represent an aggregated semantic column expression, such as the sum of sales or average order quantity.

Dimension Semantic Tables

Dimension semantic tables provide descriptive details (e.g., Customer, Product, Date) for filtering and grouping metric data.

MSL supports the following types of dimension semantic tables:

Normal: Sources from one database table to help implement a classic star schema with single-level joins. All data for a normal dimension is normalized into a single table or view. There are two types of normal dimensions:
Standard: Can have any type of hierarchy.
Time: Must have a time hierarchy. The semantic table is of the TIME dimension type when it has a metric with the time_type parameter set to a non-default value.
Snowflake: A logical dimension that is composed of multiple underlying physical datasets.
Outrigger: Secondary dimensions referenced from other dimensions (e.g., Customer dimension with an embedded Nation lookup).
Role-Playing Dimensions: Same dimension table used in different contexts (e.g., Date as Order Date, Ship Date).
Degenerate: Uses columns in a fact source database table and shares the semantic table with metrics.
Junk Dimensions: Low-cardinality flags and indicators grouped together to simplify the schema.
Slowly Changing Dimensions (SCDs): Handling historical changes in attributes (Type 1 overwrite, Type 2 versioning, Type 3 alternate columns).
Many-to-many: Also called multi-valued. This occurs when a fact dataset row references more than one row in a dimension dataset. In MSL, this is modeled by defining a dimensional bridge or junction table to resolve the many-to-many relationship.

Entity Relationships

---
config:
    class:
        hideEmptyMembersBox: true
---
classDiagram
`MSL Object` <|-- `Semantic Table`
`MSL Object` <|-- `BI Object`
`MSL Object` <|-- Level

`BI Object` <|-- `Generic Semantic Column`
`BI Object` <|-- Metric

`BI Object` <|-- Hierarchy

`Generic Semantic Column` <|-- `Semantic Column`
`Generic Semantic Column` <|-- `Calculated Semantic Column`

class `MSL Object` {
    <<interface>>
    String name [1..1]
    String description
    String objectID
}
class `BI Object`{
    <<interface>>
    Boolean hide
    String folder
}

classDiagram
`Semantic Table` *-- `Generic Semantic Column`: columns
`Semantic Table` *-- `Metric` : metrics
`Semantic Table` *-- `Level` : levels
`Semantic Table` *-- `Hierarchy`: hierarchies
`Semantic Table` *-- `Unique Key`
`Semantic Table` *-- Join
`Join` *-- RolePlay

`Level` o-- `Generic Semantic Column`
`Hierarchy` o-- `Level`: levels

`Metric` o-- `Generic Semantic Column` :expression

class `Semantic Table`  {
    String source [1..1]
    Array~Unique Key~ unique_keys [1..1]
    Array~Join~ joins
    Array~Generic Semantic Column~ columns
    Array~Metric~ metrics
    Array~Level~ levels
    Array~Hierarchy~ hierarchies
    String synonyms
}
class `Unique Key`  {
    Array~String~ columns [1..1]        
}
class Join  {
    Array~String~from_fk_columns [1..1]
    String to_semantic_table [1..1]
    String to_unique_key
    Array~Join~ joins
    RolePlay role_play
}
class RolePlay  {
    String prefix
    String suffix
    Array~String~ attributes
    Array~String~ hierarchies
}
class `Level`   { 
    Array~String~ key_columns [1..1]    
    Array~String~ alternate_key_columns
    String value_column
    String order_by_column
    Array~String~ dependent_columns
    String time_type
}
class `Hierarchy`   {
    Array~String~ levels
}
class `Generic Semantic Column` { 
    String bi_data_type
    String bi_category
    String default_aggregation
}

`Generic Semantic Column` <|-- `Semantic Column`
`Generic Semantic Column` <|-- `Calculated Semantic Column`

class `Semantic Column` { 
    String source_column   [1..1]
}

class `Calculated Semantic Column`  { 
    Array~Expression~ expression  [1..1]
}

class `Metric`  {
    Array~Expression~ expression [1..1]
    %% Push Non-Additive Metrics to the Client (e.g., BI) Semantic Layer
    Boolean nonAdditive
}

Semantic Table Definition

source

Type: string
Required: Yes

Identifies a source database table (or view) for this semantic table. When a dimension semantic table involves multiple tables (e.g., snowflake), this attribute identifies the grain table (e.g., connected to the fact table).

TBD: Should we enable users to specify a SQL query to implement custom scenarios?

unique_keys

Type: Array<Unique Key>
Required: Yes

Defines one or more sets of columns that uniquely identify rows in the semantic table. The first key, or one explicitly named "default", is used as the default unique key for joins. Joins can optionally reference non-default keys.

- name: Customer
  source: customer
  unique_keys:
    - name: customer    # The first key or one called default is the default unique key used for joins. 
                        # Joins can optionally use non-default keys. 
      columns:          # Keys supports either "multiple" columns or "single" column (shorter) property
        - c_custkey
    - name: second key
      columns:
        - c_custname

- name: Orders
  source: orders
  unique_keys:
    - column: o_orderkey  # Fact tables do not need to define keys unless the target database is Snowflake

- name: Nation
  source: nation
  unique_keys:
    - name: default       # default key
      columns:
        - n_nationkey

- name: Region
  source: region
  unique_keys:  # The key name is 'default' when the name property is missing.
    - column:
        - r_regionkey

Note: Fact tables do not need to define keys unless the target database is Snowflake.

joins

Type: Array< Join>
Required: No

Join relationships this table or model has. See Join for details.

synonyms

Type: string
Required: No

The comma-separated list of synonyms. For example, the active_user_count can have synonyms: active users, monthly active users, MAU

Synonyms are used for informational purposes only and act as alternative, business-friendly names for objects. like semantic tables. They provide a layer of abstraction that helps Large Language Models (LLMs) and business users interpret data, without affecting the underlying database objects. Semantic Column

Semantic Columns enables you to either select a database source_column or define a calculation expression.

A basic column uses the source_column property to select a database column in the source table.

A calculated column determines its row values using the expression parameter that can involve database columns and semantic columns. They are helpful for row-level computations, such as combining text, calculating new values, or creating categories. The expression parameter enables you to define formulas for different database dialects.

You use semantic columns to define levels and metrics. Unless you decide to hide them (e.g., ID or key columns), they are available in the BI semantic layer.

Join

A Join represents a MANY_TO_ONE relationship between FROM (many) and TO (one) semantic tables using the from_unique_key and to_fk_columns properties.

The TO semantic table (e.g., dimension in the relationship to a fact) can define multiple unique_keys referenced by different joins. When the to_unique_key is missing, the join uses the default unique key.

You can optionally name a join using the name property. When it is missing, the join assumes the name of the to_semantic_table.

TBD: Why do we need to name joins, and what rules should these names follow (e.g., uniqueness)?

The from_columns point to the foreign key column(s) of the FROM table.

You can define joins in the model and semantic tables.

You can use nested joins to depict precise relationships between tables.

Example illustrating Model, Fact, and Dimension joins:

- name: TPC-H Lineitem-Order Customer-Nation Catalog 
  definition: Example of all kinds of joins (only - not dimension and fact details).

  dimensions:
    - name: Nation
      source: nation
      unique_keys:  # the default unique key
          column: n_nationkey

    - name: Region
      source: region
      unique_keys:  # the default unique key
        - column: r_regionkey

    - name: Customer Nation Region
      source: customer
      unique_keys:    # the default unique key
        - column: c_custkey

# 1.  Dimension Semantic Table - snowflake joins
      joins:
        - from_fk_column: c_nationkey
          to_semantic_table: Nation
          joins:
            - from_fk_column: n_regionkey
              to_semantic_table: Region

  facts:
    - name: Orders
      source: orders
      unique_keys:  # the default unique key
        - column: o_orderkey

    - name: Lineitem Orders
      source: lineitem
      unique_keys:
          columns:
            - l_orderkey        
            - l_linenumber

# 2. Fact Semantic Table - dependent multi-fact joins
      joins:
        - from_fk_column: l_orderkey
          to_semantic_table: Orders

  models:
    - name: Model

      fact_semantic_table: Lineitem Order

# 3. Model - fact and dimension joins
      joins:
        - from_fk_column: o_orderkey # FK
          to_semantic_table: Customer Nation Region

        - from_fk_column: l_suppkey # FK
          to_semantic_table: Supplier

Array~String~from_fk_columns [1..1]
String to_semantic_table [1..1]
String to_unique_key
Array~Join~ joins

to_semantic_table

Type: String
Required: Yes

The TO semantic table that uses the source table with the to_unique_key.

from_fk_columns

Type: Array
Required: Yes

List of database columns in the source table that make up the foreign key.

to_unique_key

Type: String
Required: No

The name of the unique key in the TO semantic table. When the property is missing, the DEFAULT key is used.

joins

Type: Array
Required: No

Optional nested Joins.

role_play

Type: RolePlay
Required: No

For role-playing dimensions only. Defines the role-playing template for the TO dimension in the semantic table join.

RolePlay

The role-playing template is the prefix and/or suffix added to every name attribute in the role-playing dimension, including the names of hierarchies.

You can subset the list of hierarchies in a role-playing dimension semantic tables using the hierarchies property. When it is missing, all hierarchies are included.

The following example defines "Order Date" and "Ship Date UPS" role-playing dimension semantic tables.

  - name: Sales - Role-playing Date dimension
    fact_semantic_table: Sales

    semantic_tables:
      - name: Date
        source: dim_date  
        unique_key:
          - column: d_datekey

    joins:
      - to_semantic_table: Date
        from_column: o_orderdatekey
        role_play:
          - prefix: Order

      - to_semantic_table: Date
        from_column: o_shipdatekey
        role_play:
          - prefix: Ship
            suffix: UPS
            hierarchies:
              - Retail 445 Week Hierarchy

prefix

Type: string
Required: Yes (when the suffix is missing) TBD: The prefix text .

suffix

Type: string
Required: Yes (when the prefix is missing)

TBD: The suffix text.

columns

columns - Type: Array - Required: No

An optional list of semantic columns to include in the role-playing dimension. When the property is missing, all semantic columns are included.

hierarchies

Type: Array
Required: No

Optional list of hierarchies to include into the role playing dimension. When the property is missing, all hierarchies are included. Including a hierarchy includes all semantic columns its levels depend on.

Semantic Column

A semantic column is defined from the source database column and can be used as a 'base fact' (when default_aggregation is set) or a 'dimension attribute' (otherwise).

source_column

Type: String
Required: Yes

The name of the database column in the source table specified in the parent semantic table.

Calculated Semantic Column

A calculated semantic column is defined using an expression and can be used as 'base fact' (when default_aggregation is set) or 'dimension attribute' (otherwise).

Expression in Calculated Semantic Column

Type: array
Required: Yes

Row-level dialect-specific expression defining this semantic column from database columns or other semantic columns using $(sem_col_name)$ macro. For example, when x=t.c1+t.c2 and y=$(x)$/100, then y=(t.c1+t.c2)/100).

See Expression for details.

Generic Semantic Column

The base abstract definition, which Semantic Column and Calculated Semantic Column implement.

serves as a layer bonding logical Dimension's Attribute and Base Fact to their physical

database column using expression that can reference these columns or other "semantic columns".

Expression in Generic Semantic Column

Type: array
Required: Yes (when source_column is missing)

A dialect-specific expression defining a semantic column from database columns and other semantic columns.

See Expression for details.

default_aggregation

Type: string
Required: No
Default: NONE

Default aggregation function name to apply to the base fact semantic column. MSL defines generic (not tool-specific) enumeration of aggregation functions and their mapings to tool-specific functions. You can specify an enumeration value or a string for unsupported tool-specific function.

Common Aggregation enumeration:

Common	Description	Oracle	SAP HANA	Snowflake	Databricks (Spark SQL)	Power BI (DAX / visual)	Tableau
SUM	Sum of values	SUM(x)	SUM(x)	SUM(x)	SUM(x)	SUM(x)	SUM(x)
AVERAGE	Arithmetic mean	AVG(x)	AVG(x)	AVG(x)	AVG(x)	AVERAGE(x)	AVG(x)
MIN	Smallest value	MIN(x)	MIN(x)	MIN(x)	MIN(x)	MIN(x)	MIN(x)
MAX	Largest value	MAX(x)	MAX(x)	MAX(x)	MAX(x)	MAX(x)	MAX(x)
COUNT_ROWS	Count of all rows	COUNT(*)	COUNT(*)	COUNT(*)	COUNT(*)	COUNTROWS(table) (visual: Count)	Number of Records
COUNT_NON_NULL	Count of non-nulls in column	COUNT(x)	COUNT(x)	COUNT(x)	COUNT(x)	(visual) Count (Not blank) — COUNT(x)/COUNTA(x)	CNT(x)
COUNT_DISTINCT	Count of distinct values	COUNT(DISTINCT x)	COUNT(DISTINCT x)	COUNT(DISTINCT x)	COUNT(DISTINCT x)	DISTINCTCOUNT(x)	CNTD(x)
MEDIAN	50th percentile (median)	MEDIAN(x)	MEDIAN(x)	MEDIAN(x)	percentile_approx(x, 0.5)	MEDIAN(x)	MEDIAN(x)
STDDEV_SAMP	Std dev, sample	STDDEV_SAMP(x)	STDDEV_SAMP(x)	STDDEV_SAMP(x)	STDDEV_SAMP(x)	STDEV.S(x)	STDEV(x)
STDDEV_POP	Std dev, population	STDDEV_POP(x)	STDDEV_POP(x)	STDDEV_POP(x)	STDDEV_POP(x)	STDEV.P(x)	STDEVP(x)
VAR_SAMP	Variance, sample	VAR_SAMP(x)	VAR_SAMP(x)	VAR_SAMP(x)	VAR_SAMP(x)	VAR.S(x)	VAR(x)
VAR_POP	Variance, population	VAR_POP(x)	VAR_POP(x)	VAR_POP(x)	VAR_POP(x)	VAR.P(x)	VARP(x)
PERCENTILE_CONT	Continuous percentile p∈[0,1]	PERCENTILE_CONT(p) WITHIN GROUP (ORDER BY x)	same	same	percentile(x, p) or percentile_approx(x, p)	PERCENTILEX.INC(table, expr, p)	PERCENTILE(x, p) (table calc)
PERCENTILE_DISC	Discrete percentile p∈[0,1]	PERCENTILE_DISC(p) WITHIN GROUP (ORDER BY x)	same	same	— (use approx)	PERCENTILEX.EXC(...)	— (use PERCENTILE calc)
APPROX_COUNT_DISTINCT	Fast approx distinct count	APPROX_COUNT_DISTINCT(x)	—	APPROX_COUNT_DISTINCT(x)	approx_count_distinct(x)	—	—
APPROX_PERCENTILE	Fast approx percentile	APPROX_PERCENTILE(x, p)	—	APPROX_PERCENTILE(x, p)	percentile_approx(x, p[, accuracy])	—	—
ANY_VALUE	Arbitrary representative value	— (use MIN/MAX)	—	ANY_VALUE(x)	— (use first/last or MIN)	SELECTEDVALUE(x) (or logic)	ATTR(x)

Example:

- name: Units
  aggregation:SUM

bi_data_type

Type: string
Required: Yes

Generic (not tool-specific) data type classification that defines the kind of data a column contains, such as text, whole numbers, decimals, or dates. categorization In Power BI Desktop, the 'Data type'.

Common Data Type enumeration with descriptions, and BI tool examples:

Data type	Description	Power BI	Tableau
INTEGER	A whole number with no decimals. Maps to the largest integer data type in the tool.	Whole Number	Number (whole)
DECIMAL	A number with a fractional component.	Decimal Number Fixed Decimal Number	Number (decimal)
BOOLEAN	A value that can be either true or false.	True/False (Boolean)	Boolean
TEXT	A string of alphanumeric characters.	Text	String
DATE	A calendar date (e.g., 2025-10-17).	Date	Date
DATETIME	A value containing both date and time.	Date/Time	Date & Time
TIME	A value representing the time of day.	Time	Date & Time (at time granularity)
BINARY	Binary large object (image, document).	Binary	Not natively supported

bi_category

Type: string
Required: No

Generic (not tool-specific) categorization on top of the bi_data_type to influence the visualization behavior of BI tools. MSL maps them to tool-specific categories. Some tools support user defined categories. You can specify them as a string.

Common Category with descriptions, and BI tool examples:

CATEGORY	Description	Tableau	Power BI
AIRPORT	Airport codes/locations (IATA/ICAO).	Airport	—
AREA_CODE_US	U.S. telephone area code; geo-ish but not a location polygon.	Area Code (US)	—
CBSA_MSA_US	U.S. Core/Base Statistical Areas & Metro Statistical Areas.	CBSA/MSA (US)	—
CITY	City/locality.	City	City
CONGRESSIONAL_DISTRICT_US	U.S. House districts.	Congressional District (US)	—
COUNTRY_REGION	Nation-level.	Country/Region	Country/Region
COUNTY	County/shire/prefecture equivalents.	County	County
NUTS_EUROPE	EU NUTS levels (NUTS1/2/3).	NUTS Europe	—
STATE_PROVINCE	First-order admin division (state, province, region, oblast, etc.).	State/Province	State or Province
POSTAL_CODE	ZIP/postal/CEP/etc.	ZIP Code/Postcode	Postal Code
CONTINENT	Continental region (e.g., Europe, Asia).	—	Continent
LATITUDE	Numeric latitude.	Latitude	Latitude
LONGITUDE	Numeric longitude.	Longitude	Longitude
ADDRESS	Full street address. Not a Tableau geographic role by default.	—	Address
PLACE_GENERIC	Power BI’s generic location bucket (ambiguous granularity).	—	Place
WEB_URL	Non-geographic. Used for clickable links.	—	Web URL
IMAGE_URL	Non-geographic. Used for image rendering.	—	Image URL
BARCODE	Non-geographic. Used for bar/QR codes, product IDs, etc.	—	Barcode

Metric

A metric is a numeric value representing an aggregated database column expression, such as the sum of sales or average order quantity. Metrics always result from an aggregate calculation applied to one or more columns of fact database tables.

MSL supports the following types of metrics:

Additive: Metrics whose values can be summarized for any dimension attribute of the model and then combined consistently.
Non-additive: Metrics whose values cannot be summed across any dimensional groupings using basic addition, since this would typically produce an inaccurate result. A distinct count of an attribute value is a typical example of a non-additive metric.
Semi-additive: Metrics whose values can be summarized for some dimensions in a model, but not all. Ratios such as averages are also considered semi-additive metrics. MSL implements them in the BI semantic layer.

MSL supports basic Fact or calculated Metric (see isFact property). A fact metric originates from a specific column and may have a default aggregation. A calculated metric defines dialect-specific expressions. They can combine, evaluate, or manipulate other metrics defined in the table. For example, you can do simple math operations to combine metrics or simple comparison operations to return a given metric value when certain conditions are met. Or you can use database-specific functions to perform time-series analysis.

TBD - FUTURE: The separation of calculated metrics from other types enables you to easily create boilerplate calculations that can be used across multiple metrics (e.g., macros).

Metrics are implemented in the database. Non-additive metrics are pushed to BI clients.

Details

Metrics are quantifiable measures of business performance calculated by aggregating facts or other columns from the same table (using functions like SUM, AVG, and COUNT) across multiple rows. They transform raw data into meaningful business indicators, often combining multiple calculations in complex formulas. Examples include Total Revenue or Profit Margin Percentage. Metrics represent the KPIs in reports and dashboards that drive business decision-making.

Facts are row-level attributes in the data model that represent specific business events or transactions. While facts can be defined using aggregates from more detailed levels of data (such as SUM(t.x) where t represents data at a more detailed level), they are always presented as attributes at the individual row level of the logical table. Facts capture 'how much' or 'how many' at the most granular level, such as individual sales amounts, quantities purchased, or costs. It's important to note that facts typically function as 'helper' concepts within the semantic table to help construct dimensions and metrics.

expression

Type: Array
Required: Yes

Native expressions define metrics, including their aggregation functions, windows, and ratios. Expressions should reference base facts from this Fact semantic tables and other metrics as reusable components.

TBD: Define a calculated measure by specifying an analytic view expression. The expression may reference other measures in the analytic view, but may not reference fact columns. Calculated measures do not have an aggregate clause because they're computed over the aggregated base measures.

Example:

metrics:
  - name: total_revenue
    expression:
      - dialect: Snowflake
        text: SUM(o_totalprice)
      - dialect: Oracle
        text: SUM(o_totalprice)

nonAdditive

Type: Boolean
Required: No
Default: false

BI tools aggregate all measures by default. To avoid re-aggregation of non-additives (like exact distinct count), set this attribute to true. It requests the application to push the metric to the client side (e.g., BI or SQL). The value is false by default.

Example: TBD

metrics:
  - name: distinct_customers
    nonAdditive: true
    expression:
      - dialect: Snowflake
        text: COUNT(DISTINCT c_custkey)
      - dialect: Power BI
        text: COUNT(DISTINCT c_custkey)

Expression

Expressions define database-specific calculations and formulas.

dialect

Type: string
Required: Yes

The name of the relational database or business analytics tool (dialect) for which this expression is written.

Supported values:

DATABRICKS
SNOWFLAKE
ORACLE
HANA
POWER BI
TABLEAU

Example:

expression:
  - dialect: ORACLE
    text: o_totalprice / 365

text

Type: string
Required: Yes

The actual SQL expression or formula.

Example:

expression:
  - dialect: SNOWFLAKE
    text: SUM(o_totalprice * o_quantity)

Unique Key

columns

Type: array< string>.
Required: Yes Ordered list of database columns in the key