Skip to content

Metadata Governance Policies

Kelp's governance policy system lets you define and enforce metadata quality rules against your local YAML definitions. Policies check that models and columns meet standards like having descriptions, required tags, naming conventions, and constraint definitions.

Policy checks:

  • Run automatically on every init() call when policy_config.enabled: true
  • Run on demand with kelp check-policies
  • Operate on your local YAML metadata only — no Unity Catalog connection needed

Setup

1. Create a Policies Directory

Add a directory for your policy YAML files alongside your other metadata:

my_project/
├── kelp_project.yml
└── kelp_metadata/
    ├── models/
    ├── functions/
    └── policies/              ← Add this
        └── data_standards.yml

2. Configure the Project

Point Kelp to your policies directory and enable checks in kelp_project.yml:

kelp_project:
  models_path: "./kelp_metadata/models"
  policies_path: "./kelp_metadata/policies"

  policy_config:
    enabled: true
    fast_exit: false

policy_config supports:

  • enabled: global on/off switch for policy checks
  • fast_exit: stop evaluation at the first violating policy per model

All governance rules are defined in the policy YAML files themselves.

Policy YAML Structure

Policy files use the kelp_policies key and contain a list of named policy definitions:

kelp_policies:
  - name: global_standards
    applies_to: "*"
    model:
      require_description: true
      require_any_tag: true
      severity: warn
    column:
      require_description: true
      severity: error

Policy Fields

Field Type Description
name string Unique policy identifier
applies_to string (glob) Path pattern for models this policy applies to
model ModelPolicyRule Rules applied at the model level
column ColumnPolicyRule Rules applied at the column level

applies_to Patterns

The applies_to field is matched against each model's source file path. In practice, use patterns rooted at models/... so they match paths like .../kelp_metadata/models/bronze/bronze_customers.yml. Standard glob wildcards apply:

applies_to: "models/bronze/*"    # All models in the bronze/ subdirectory
applies_to: "models/silver/*"    # All models in the silver/ subdirectory
applies_to: "models/gold/*"      # All models in the gold/ subdirectory
applies_to: "*"           # All models (catch-all)

Kelp evaluates policies in order and applies all matching policies to each model. If fast_exit is enabled, evaluation stops at the first matching policy that produces one or more violations for that model.

Model-Level Rules

Rules under model: apply to the model itself:

model:
  require_description: true            # Model must have a non-empty description
  require_any_tag: true                # At least one tag must exist
  require_tags:                        # Specific tag keys that must be present
    - owner
    - domain
  require_constraints:                 # Constraint types that must be defined
    - primary_key
  naming_pattern: "^(bronze|silver|gold)_.*"  # Regex pattern for model names
  not: false                           # Optional: invert checks in this rule block
  severity: warn                       # warn or error

Model Rule Reference

Field Type Default Description
require_description bool false Model must have a non-empty description
require_any_tag bool false At least one tag must exist on the model
require_tags list[str] [] Specific tag keys that must be present
require_constraints list[str] [] Constraint types that must be defined (primary_key, foreign_key)
naming_pattern string null Regex pattern that model names must match
has_columns list[str] [] Column names that must be present in the model
has_table_property dict {} Table properties that must exist (partial match, extra keys allowed)
has_quality_check bool false Model must have quality checks configured
not bool false Invert checks in this rule block (forbid instead of require)
severity warn/error warn Severity when this rule is violated

Column-Level Rules

Rules under column: apply to every column in matching models:

column:
  require_description: true            # Every column must have a description
  require_any_tag: true                # Every column must have at least one tag
  require_tags:                        # Specific tag keys required on every column
    - data_classification
  naming_pattern: "^[a-z][a-z0-9_]*$" # Snake case enforcement
  naming_patterns_by_type:            # Per data type naming rules
    - data_type: BOOLEAN
      pattern: "^(is_|has_|can_).*"
    - data_type: TIMESTAMP
      pattern: ".*_at$"
  not: false                          # Optional: invert checks in this rule block
  severity: error                      # warn or error

Column Rule Reference

Field Type Default Description
require_description bool false Every column must have a non-empty description
require_any_tag bool false Every column must have at least one tag
require_tags list[str] [] Specific tag keys required on every column
naming_pattern string null Regex pattern that all column names must match
naming_patterns_by_type list [] Naming patterns scoped to a specific SQL data type
not bool false Invert checks in this rule block (forbid instead of require)
severity warn/error warn Severity when this rule is violated

Data Type Naming Conventions

Use naming_patterns_by_type to enforce naming conventions for a specific SQL data type. This is evaluated only for columns that have a data_type set:

column:
  naming_patterns_by_type:
    - data_type: BOOLEAN
      pattern: "^(is_|has_|can_).*"   # Boolean columns must start with is_, has_, or can_
    - data_type: TIMESTAMP
      pattern: ".*_at$"               # Timestamps must end in _at
    - data_type: DATE
      pattern: ".*_date$"             # Dates must end in _date
    - data_type: STRING
      pattern: "^[a-z][a-z0-9_]*$"   # Strings must be snake_case
  severity: warn

Advanced Model-Level Rules

Column Requirements

Enforce that models have specific columns present:

model:
  has_columns:                    # Model must include these columns
    - id
    - created_at
    - updated_at
  severity: warn

For forbidden columns, use not: true with has_columns:

model:
  not: true
  has_columns:
    - temp
    - debug
  severity: warn

Table Property Requirements

Enforce that models have specific table properties set:

model:
  has_table_property:             # Table must have these properties
    retention_days: "90"
    owner: "data_team"
    encryption_enabled: "true"
  severity: error

The has_table_property check uses partial matching: extra properties are allowed. This means if your table has {owner: "data_team", cost_center: "cc123"} and the policy requires {owner: "data_team"}, the check passes because the required property is present with the correct value.

Quality Check Requirements

Enforce that models have quality checks configured:

model:
  has_quality_check: true         # Model must have quality checks (SDP or DQX)
  severity: error

This checks that the model has a quality section defined (either SDP or DQX). It does not validate the content of the checks, just their presence.

Severity Levels

Severity Behavior
warn Violation is logged as a warning; loading continues normally
error Violation is logged as an error; a RuntimeError is raised after all checks complete

model.severity and column.severity are independent. A policy can warn on model violations while failing hard on column violations:

model:
  require_description: true
  severity: warn           # Just warn for missing model descriptions
column:
  require_description: true
  severity: error          # Fail hard for missing column descriptions

applies_to Path Patterns

The applies_to field uses glob patterns to match model file paths. Patterns are matched against the origin_file_path of models (e.g., models/bronze/customers.yml).

Important guidelines: - Patterns must start with models/ — this ensures they match the models directory - Use forward slashes / (not backslashes) - Glob wildcards (*, ?, **) are supported

Valid patterns:

applies_to: "models/bronze/*"       # All models in the bronze/ directory
applies_to: "models/*/events*"      # All events models in any layer
applies_to: "models/silver/**/fact_*"  # All fact models anywhere in silver/
applies_to: "models/**"             # All models recursively

Invalid patterns (won't match):

applies_to: "bronze/*"               # Missing "models/" prefix
applies_to: "*/models/*"             # "models" not at the start
applies_to: "models"                 # No glob to match files

Kelp evaluates policies in order and applies all matching policies to each model. Use policy_config.fast_exit: true (or --fast-exit) when you prefer faster checks over complete violation reporting.

Running Policy Checks

Automatic (on init)

When policy_config.enabled: true, checks run every time init() is called:

from kelp import init

ctx = init()  # Policy checks run here if enabled

CLI

Run checks on demand with kelp check-policies:

# Run all policy checks
uv run kelp check-policies

# Run against a specific target
uv run kelp check-policies --target prod

# Only show violations at or above a severity level
uv run kelp check-policies --severity error

# Exit with code 1 when any warning is found (useful for strict CI gates)
uv run kelp check-policies --fail-on warn

# Stop at first violating policy per model (speed optimization)
uv run kelp check-policies --fast-exit

Example Output

[POLICY WARN] Model 'kelp_catalog.kelp_bronze.bronze_events' is missing a description.
[POLICY WARN] Model 'kelp_catalog.kelp_bronze.bronze_events' is missing required tag 'owner'.
[POLICY ERROR] Column 'raw_payload' in model 'kelp_catalog.kelp_bronze.bronze_events' is missing a description.

✘ Policy check failed: 3 violation(s) found (1 error, 2 warnings).

Multiple Policies Per File

A single file can hold multiple policies. Kelp merges all policy files in policies_path:

kelp_policies:
  - name: bronze_standards
    applies_to: "models/bronze/*"
    model:
      require_description: true
      require_tags:
        - owner
        - domain
      severity: warn
    column:
      require_description: true
      severity: warn

  - name: silver_standards
    applies_to: "*/models/silver/*"
    model:
      require_description: true
      require_tags:
        - owner
        - domain
      require_constraints:
        - primary_key
      severity: error
    column:
      require_description: true
      require_tags:
        - data_classification
      severity: error

  - name: gold_standards
    applies_to: "models/gold/*"
    model:
      require_description: true
      require_any_tag: true
      require_constraints:
        - primary_key
      naming_pattern: "^gold_.*"
      severity: error
    column:
      require_description: true
      require_tags:
        - data_classification
        - owner
      naming_patterns_by_type:
        - data_type: BOOLEAN
          pattern: "^(is_|has_|can_).*"
        - data_type: TIMESTAMP
          pattern: ".*_at$"
      severity: error

  - name: global_fallback
    applies_to: "*"
    model:
      require_description: true
      severity: warn

Full Example

Project Layout

my_project/
├── kelp_project.yml
└── kelp_metadata/
    ├── models/
    │   ├── bronze/
    │   │   └── bronze_customers.yml
    │   ├── silver/
    │   │   └── silver_customers_cleaned.yml
    │   └── gold/
    │       └── gold_customer_summary.yml
    └── policies/
        └── data_standards.yml

kelp_project.yml

kelp_project:
  models_path: "./kelp_metadata/models"
  policies_path: "./kelp_metadata/policies"

  models:
    +catalog: ${ catalog }
    bronze:
      +schema: bronze
    silver:
      +schema: silver
    gold:
      +schema: gold

  policy_config:
    enabled: true
    fast_exit: false

vars:
  catalog: my_catalog

kelp_metadata/policies/data_standards.yml

kelp_policies:
  - name: bronze_layer
    applies_to: "models/bronze/*"
    model:
      require_description: true
      require_tags:
        - owner
        - domain
      severity: warn
    column:
      require_description: true
      severity: warn

  - name: silver_layer
    applies_to: "models/silver/*"
    model:
      require_description: true
      require_tags:
        - owner
        - domain
      require_constraints:
        - primary_key
      severity: error
    column:
      require_description: true
      require_tags:
        - data_classification
      severity: error

  - name: gold_layer
    applies_to: "models/gold/*"
    model:
      require_description: true
      require_any_tag: true
      require_constraints:
        - primary_key
      naming_pattern: "^gold_.*"
      severity: error
    column:
      require_description: true
      require_tags:
        - data_classification
      naming_patterns_by_type:
        - data_type: BOOLEAN
          pattern: "^(is_|has_|can_).*"
        - data_type: TIMESTAMP
          pattern: ".*_at$"
      severity: error

Advanced Example: Comprehensive Policy with New Rules

This example demonstrates the new column, property, and quality check rules:

kelp_policies:
  - name: silver_with_advanced_rules
    applies_to: "models/silver/*"
    model:
      require_description: true
      require_tags:
        - owner
        - domain
      # Require that all silver tables have identity and timestamp columns
      has_columns:
        - id
        - created_at
        - updated_at
      # Forbid temporary or debug columns
      not: true
      has_columns:
        - temp
        - debug
        - test_data
      # Enforce table properties for data governance
      has_table_property:
        owner: "data_platform"
        retention_days: "90"
        pii_sensitive: "false"
      # All silver tables must have quality checks
      has_quality_check: true
      require_constraints:
        - primary_key
      severity: error
    column:
      require_description: true
      require_tags:
        - data_classification
      naming_pattern: "^[a-z][a-z0-9_]*$"
      naming_patterns_by_type:
        - data_type: BOOLEAN
          pattern: "^(is_|has_|can_).*"
        - data_type: TIMESTAMP
          pattern: ".*_at$"
      severity: error

When a model fails these checks, Kelp reports violations like:

[POLICY ERROR] Model 'catalog.silver.transactions' is missing required column 'id'.
[POLICY ERROR] Model 'catalog.silver.transactions' must not have column 'temp'.
[POLICY ERROR] Model 'catalog.silver.transactions' is missing required table property 'owner'.
[POLICY ERROR] Model 'catalog.silver.transactions' must have quality checks configured.

See Also