Operator Module · Tier 2

Cortex Analyst
Semantic Models

Design, author, and deploy a production-ready semantic model for Cortex Analyst. Built for data engineers and analytics engineers already running Snowflake — not a getting-started tutorial.

Learning objectives

01
Understand why a semantic model exists and what it does that a schema alone cannot.
02
Author a YAML semantic model for a real table including measures, dimensions, and filters.
03
Deploy to Cortex Analyst and test accuracy against natural language queries.
04
Diagnose and remediate accuracy failures using verified queries and synonyms.
05
Implement governance controls — RBAC, logging, and cost containment.
06
Measure accuracy and iterate using a structured testing methodology.

Level

Operator
Hands-on

Duration

90 minutes
plus lab

Prerequisites

Snowflake account
Cortex enabled

Why this matters

Cortex Analyst accuracy collapses without a well-authored semantic model. Teams that skip the semantic layer and point Cortex at raw tables get demo-quality results — not production-grade natural language querying. This module closes that gap.

00

Module Contents

What this module covers

This module is structured as a complete build — context, concept, construction, deployment, validation, and governance. Every chapter produces a tangible artifact. By the end you will have a deployed, tested, production-ready semantic model.

01

The semantic model concept
Why schemas alone fail natural language. The three layers of business context that Cortex Analyst requires to produce reliable SQL.

p.03
02

Anatomy of a YAML semantic model
Tables, dimensions, measures, synonyms, verified queries, filters. What each block does and how they interact.

p.04
03

The five-step build
Hands-on lab. Prepare the workspace, identify entities, author the YAML, deploy, and test.

p.05
04

Diagnosing accuracy failures
Why natural language queries fail. A structured troubleshooting matrix for the six most common failure modes.

p.06
05

Governance and cost controls
RBAC, audit logging, credit limits, and token budgets. Operator-grade controls before going to production.

p.07
06

Verified queries in depth
Why verified queries are the single most effective accuracy lever. How to author them, how many to write, and how to maintain them.

p.08
07

Hands-on lab worksheet
A guided exercise to build, deploy, and test a semantic model against a sample dataset. Templates and checkpoints included.

p.09
08

Knowledge check
Five assessment questions covering concepts, construction, governance, and troubleshooting. Answer key on the closing page.

p.10
09

Glossary and reference
Key terminology, syntax cheat sheet, and a checklist for production readiness.

p.11
10

Module completion and next steps
Module summary, what you learned, and the next module in the Operator series.

p.12

01

Chapter 01

The semantic model concept

A Snowflake schema tells the database engine what columns exist and how they are typed. It does not tell anything — not a human, not a language model — what those columns mean in business terms. A column named rev_m_ttm is a database fact. Whether it represents "trailing twelve month revenue in millions" is a business fact that lives only in someone's head.

Cortex Analyst converts natural language into SQL. To do that reliably, it needs a translation layer between the two — one that maps what users actually ask ("What was our revenue last quarter?") to what the schema actually contains. That translation layer is the semantic model.

Schema vs. semantic model

Schema only

Tables, columns, types, and keys. Tells the engine what rows are valid. Tells a user nothing about what a column means, how it's measured, or what synonyms exist for it.

Schema + semantic model

Adds measures, dimensions, synonyms, business definitions, and verified query examples. Tells Cortex Analyst what a column means in business terms and how users actually refer to it.

Three layers of context Cortex Analyst needs

Structural context — which tables and columns exist. This is your schema, already in Snowflake. Cortex reads it automatically.

Semantic context — what those tables and columns mean. What is a measure versus a dimension. What calculations define derived values. What filters are meaningful. This lives in your YAML file.

Linguistic context — how humans actually talk about the data. Synonyms for column names, example questions paired with verified SQL, business-specific terms. This is the single highest-impact layer for accuracy.

The accuracy principle

Cortex Analyst accuracy is a direct function of the quality and completeness of these three layers. A minimal semantic model will produce approximately 60 percent accuracy on production queries. A well-authored one with synonyms and verified queries routinely clears 92 percent. This is the difference between a demo and a system people actually use.

02

Chapter 02

Anatomy of a YAML semantic model

A Cortex Analyst semantic model is a YAML file with a specific structure. Below is a minimal working example against a fictional orders table. Each block has a job to do — understanding what each block contributes is the core skill of this module.

# semantic_model.yaml — orders domain name: orders_model description: Order and revenue analytics for commercial customers tables: - name: orders description: One row per customer order, all channels base_table: database: ANALYTICS_PROD schema: SALES table: FACT_ORDERS dimensions: - name: region expr: region_code synonyms: [territory, market, geography, area] description: Sales region code, 2-letter ISO - name: product_category expr: product_cat synonyms: [category, product line, product type] time_dimensions: - name: order_date expr: order_dt synonyms: [date ordered, purchase date, when bought] measures: - name: total_revenue expr: SUM(order_total_usd) synonyms: [revenue, sales, total sales, top line] description: Sum of order total in USD, pre-discount - name: order_count expr: COUNT(DISTINCT order_id) synonyms: [orders, number of orders, transactions] filters: - name: last_quarter expr: order_dt >= DATEADD(quarter, -1, CURRENT_DATE()) verified_queries: - name: revenue_by_region_ltq question: What was revenue by region last quarter? sql: SELECT region_code, SUM(order_total_usd) FROM FACT_ORDERS WHERE order_dt >= DATEADD(quarter, -1, CURRENT_DATE()) GROUP BY region_code

What each block does

Dimensions are how users slice data — region, product, customer. Anything that answers "by what?" is a dimension.

Measures are what gets aggregated — revenue, count, average. Anything that answers "how much?" or "how many?" is a measure. Define them with explicit SQL expressions so there is no ambiguity.

Synonyms are the single highest-leverage element in the whole file. They map business language to your columns. Write them from the perspective of the user, not the database — "top line" maps to revenue, "market" maps to region.

Verified queries are worked examples that pair a natural language question with the correct SQL. They teach Cortex Analyst how your specific business asks specific questions. Ten well-chosen verified queries will outperform a hundred generic synonyms.

03

Chapter 03

The five-step build

Step 01

Prepare your workspace

Create a dedicated schema for semantic models and a stage to hold the YAML files. Use a dev role with read access to the target tables — never build directly against production.

CREATE SCHEMA IF NOT EXISTS ANALYST_DEV.SEMANTIC_MODELS;
CREATE STAGE IF NOT EXISTS ANALYST_DEV.SEMANTIC_MODELS.YAML_STAGE;
GRANT USAGE ON SCHEMA ANALYST_DEV.SEMANTIC_MODELS TO ROLE ANALYST_BUILDER;

Step 02

Identify tables, measures, and dimensions

Pick one fact table and its supporting dimensions. List the measures (what gets aggregated) and dimensions (how it gets sliced). Resist modeling everything at once — start narrow, expand after you achieve accuracy.

-- Fact: FACT_ORDERS
-- Measures: total_revenue, order_count, avg_order_value
-- Dimensions: region, product_category, customer_segment
-- Time dimension: order_date

Step 03

Author the YAML semantic model

Write the YAML using the anatomy from Chapter 02 as your reference. Cortex Code CLI can scaffold a starter file for you — but every synonym and every verified query should be reviewed by a human who knows the business.

cortex analyst generate-model \
  --table ANALYTICS_PROD.SALES.FACT_ORDERS \
  --output ./semantic_model.yaml

Step 04

Deploy and test with natural language

Upload the YAML to your stage and register it with Cortex Analyst. Test with 15 to 20 real questions written by real users — not rephrased to match the schema. This is the only honest accuracy signal you get.

PUT file://semantic_model.yaml
  @ANALYST_DEV.SEMANTIC_MODELS.YAML_STAGE AUTO_COMPRESS=FALSE;
-- Test: "What was revenue by region last quarter?"

Step 05

Measure accuracy and iterate

Track accuracy against a fixed test battery. For each failure, add a verified query and relevant synonyms, then re-test. A production-ready model sits above 90 percent accuracy on the user question set, not just yours.

Operator Tip

Verified queries are not optional. A semantic model without 10+ verified queries covering your highest-frequency questions will fail unpredictably. Treat them as part of the build, not a post-launch addition.

04

Chapter 04

Diagnosing accuracy failures

When Cortex Analyst produces wrong SQL, the failure almost always traces to one of six root causes. A structured diagnostic matrix saves hours of trial-and-error iteration.

Symptom	Root cause and fix
Query returns empty result but question was valid	Dimension value mismatch. User said "northeast" but column values are "NE". Add a synonym mapping or a computed column with readable values.
Wrong column selected as the measure	Ambiguous synonym. "Sales" could mean revenue or unit count. Clarify the synonym in the YAML and add a verified query disambiguating the user's most common phrasing.
Correct columns but wrong aggregation	Measure expression missing or ambiguous. Every measure must have an explicit expr like SUM, COUNT, or AVG. Never let Cortex infer the aggregation.
Time filter not applied when user said "last quarter"	Time dimension not declared, or no filter definition for the phrase. Add a named filter for common time ranges and reference it in verified queries.
Correct SQL but returns zero rows	RBAC on the underlying table. Cortex Analyst inherits row access policies. Confirm the calling role has access to the rows — this is working as designed.
Wrong table selected across related fact tables	Table description too generic. Rewrite the description field to explicitly state what distinguishes this fact table from others that might seem similar.

The iteration loop

Run the test battery. For each failure, identify the root cause from the matrix above. Add the specific fix — synonym, verified query, filter, or description. Re-run the battery. Track accuracy week over week. Stop iterating when you clear 90 percent and the remaining failures are edge cases acceptable to your business stakeholders.

05

Chapter 05

Governance and cost controls

A semantic model that works is not the same as a semantic model ready for production. Before you flip it on for end users, put four governance controls in place. Every one of these is non-optional in a regulated environment.

Four production controls

RBAC inheritance confirmed. Cortex Analyst runs under the calling user's role. Verify that row access policies, column masking, and data exclusions are enforced by running test queries as a restricted role.
Audit logging active. Every Cortex Analyst query should land in ACCOUNT_USAGE.QUERY_HISTORY with the invoking role. Confirm before going live and set an alert on anomalous usage.
Per-user credit limits set. Use the CORTEX_CODE_SNOWSIGHT_DAILY_EST_CREDIT_LIMIT_PER_USER parameter to cap token spend. Default to 20 credits per user per day and raise only for verified power users.
Warehouse sizing validated. Cortex Analyst executes SQL on your virtual warehouses. Run the test battery against the target warehouse size and confirm response times hit your SLA before opening access.

Cost benchmark — what to expect

~0.01

Credits per query

~2s

Median latency

92%+

Target accuracy

These are reference benchmarks for a well-authored semantic model on a small warehouse. Credit consumption varies with question complexity, output length, and model selection. Latency excludes warehouse cold-start. Accuracy depends on verified query coverage — models with fewer than 10 verified queries routinely fall below 75 percent.

Operator Tip

Audit the semantic model itself as a versioned artifact. Store the YAML in Git. Every production change goes through pull request review. Treating semantic models as code — not configuration — is what separates teams that scale from teams that don't.

06

Chapter 06

Verified queries in depth

If you only do one thing well in this module, do verified queries well. Synonyms move accuracy from 60 to 80 percent. Verified queries are what move it from 80 to 92 plus. Treat them as a first-class engineering artifact, not a post-launch addition.

What a verified query actually does

A verified query teaches Cortex Analyst the SQL pattern for a class of question. When a user asks something semantically similar — same intent, different wording — the model has a high-confidence template to work from. One well-chosen verified query covers a range of phrasings. Five verified queries can cover the bulk of a stakeholder's daily reporting needs.

A worked example

# A complete verified query — note the explicit SQL verified_queries: - name: top_products_last_quarter question: What were our top 5 products by revenue last quarter? use_as_onboarding_question: true sql: | SELECT product_name, SUM(order_total_usd) AS revenue FROM ANALYTICS_PROD.SALES.FACT_ORDERS o JOIN ANALYTICS_PROD.SALES.DIM_PRODUCT p ON o.product_id = p.product_id WHERE o.order_dt >= DATEADD(quarter, -1, CURRENT_DATE()) AND o.order_dt < DATE_TRUNC(quarter, CURRENT_DATE()) GROUP BY product_name ORDER BY revenue DESC LIMIT 5

How many verified queries do you need?

10

Minimum for production

25

Stable accuracy zone

50+

Mature semantic model

Below 10 verified queries, accuracy fluctuates with phrasing changes. Between 10 and 25, the model handles the most common reporting needs reliably. Above 50, you have a production-grade asset that can absorb new question types without retraining.

Operator Tip

Mine your existing BI workload for verified query candidates. Pull the 25 most-run dashboard queries from QUERY_HISTORY, identify the natural language equivalent each one answers, and use those pairs as your verified query seed set. This is the fastest way to a high-accuracy semantic model.

07

Chapter 07

Hands-on lab worksheet

A guided exercise to build, deploy, and test your first semantic model. Work through each task in order. Record your answers and outputs in the spaces provided. Estimated time: 60 to 90 minutes against a sandbox Snowflake account.

Task 01

Identify your fact table and three dimensions

10 min

Pick a fact table you know well and list the dimensions you would slice it by. Think about how your team actually asks questions — region, time, product, customer segment.

Task 02

Define three measures with explicit aggregations

10 min

Write the SQL expression for each measure. SUM, COUNT, AVG, or a derived calculation. Be explicit — never leave the aggregation up to inference.

Task 03

Add five synonyms per dimension and per measure

15 min

Write synonyms from the user's perspective, not the database's. "Top line" for revenue. "Market" for region. Test them against three real questions a stakeholder has asked recently.

Task 04

Author three verified queries for high-frequency questions

20 min

Take three of the most common reporting questions in your domain. Write each as a natural language question paired with its known-correct SQL. These three are your accuracy floor.

Task 05

Deploy and test against a 10-question battery

25 min

Upload the YAML to your stage. Test 10 questions written by a real user, not by you. Record accuracy.

08

Chapter 08

Knowledge check

Five questions covering the core concepts of this module. Answer key appears on the closing page. A passing score is four of five correct.

Question 01

Which element of a semantic model has the highest leverage on accuracy?

A
Table descriptions
B
Column types
C
Synonyms and verified queries
D
Warehouse sizing

Question 02

Cortex Analyst inherits which security model when executing a query?

A
The semantic model author's role
B
The calling user's role, including row access and masking
C
A dedicated Cortex service role
D
The ACCOUNTADMIN role

Question 03

A test query returns the wrong column as the measure. The most likely cause is:

A
Warehouse undersized
B
Ambiguous synonym mapping to multiple measures
C
Missing column in the base table
D
Incorrect YAML indentation

Question 04

What is the minimum recommended number of verified queries for a production semantic model?

A
Zero — synonyms are sufficient
B
3 to 5
C
10
D
100 or more

Question 05

Which of the following is NOT a valid governance control for a Cortex Analyst deployment?

A
Per-user daily credit limits
B
Column-level masking policies
C
Disabling the calling user's role inheritance
D
Audit logging via QUERY_HISTORY

09

Chapter 09

Glossary and reference

Key terminology

Semantic model

A YAML file mapping database objects to business-meaningful names, measures, and synonyms. The bridge between natural language and SQL.

Measure

An aggregatable value with an explicit SQL expression. Revenue, count, margin, average. Answers "how much?"

Dimension

A sliceable attribute. Region, product, date, customer segment. Answers "by what?"

Verified query

A natural language question paired with its known-correct SQL. The single most effective tool for raising accuracy.

Synonym

An alternate business term mapped to a model element. "Top line" for revenue. "Market" for region.

Filter

A named, reusable WHERE clause expression. Examples: last_quarter, current_fy, active_customers.

Time dimension

A specialized dimension representing dates or timestamps. Enables relative time queries like "last quarter" or "year over year."

YAML cheat sheet

Dimension

- name: region
  expr: region_code
  synonyms: [territory, market]
  description: Sales region

Measure

- name: revenue
  expr: SUM(order_total)
  synonyms: [sales, top line]

Time dimension

- name: order_date
  expr: order_dt
  synonyms: [date, when ordered]

Verified query

- name: rev_by_region
  question: Revenue by region?
  sql: SELECT region, SUM(...)

Production readiness checklist

Semantic model authored with at least 10 verified queries covering top reporting questions
Synonyms validated against real user phrasing — not the team that built the model
RBAC inheritance tested with a restricted role to confirm row and column policies enforce
Per-user daily credit limit set, default 20 credits per user
QUERY_HISTORY audit logging confirmed active and alerting configured
Accuracy test battery achieves 90%+ against a stakeholder-authored question set
YAML stored in Git with PR review process for production changes

10

Chapter 10

Module completion

What you learned

Concept

Why a schema alone fails natural language and the three layers of context — structural, semantic, linguistic — that Cortex Analyst requires for production-grade accuracy.

Construction

How to author a YAML semantic model with explicit measures, dimensions, synonyms, and verified queries. The five-step build pattern from workspace prep through accuracy iteration.

Diagnostics

The six common failure modes and the structured matrix for mapping each symptom to its root cause and specific fix. The iteration loop for stable accuracy.

Governance

Four production controls — RBAC, audit logging, credit limits, warehouse sizing — required before opening Cortex Analyst access to end users.

Cloud On Demand · Operator Module Completion

Cortex Analyst
Semantic Models

Awarded to the practitioner who completed module OPR-004

Recipient signature

Issued by Cloud On Demand · OPR-004 · v1.0

Next in series

OPR-005 · SnowWork Agent Design

→

Cortex AnalystSemantic Models

Why this matters

The accuracy principle

The iteration loop

Concept

Construction

Diagnostics

Governance

Cortex Analyst
Semantic Models