CLOUD ON DEMAND I am not your thought leader™
Module OPR-004
Version 1.0 · Apr 2026
cloud-ondemand.com
Operator Module · Tier 2

Cortex Analyst
Semantic Models

Design, author, and deploy a production-ready semantic model for Cortex Analyst. Built for data engineers and analytics engineers already running Snowflake — not a getting-started tutorial.

Learning objectives
Level
Operator
Hands-on
Duration
90 minutes
plus lab
Prerequisites
Snowflake account
Cortex enabled

Why this matters

Cortex Analyst accuracy collapses without a well-authored semantic model. Teams that skip the semantic layer and point Cortex at raw tables get demo-quality results — not production-grade natural language querying. This module closes that gap.

00
Module Contents
What this module covers

This module is structured as a complete build — context, concept, construction, deployment, validation, and governance. Every chapter produces a tangible artifact. By the end you will have a deployed, tested, production-ready semantic model.

01
Chapter 01
The semantic model concept

A Snowflake schema tells the database engine what columns exist and how they are typed. It does not tell anything — not a human, not a language model — what those columns mean in business terms. A column named rev_m_ttm is a database fact. Whether it represents "trailing twelve month revenue in millions" is a business fact that lives only in someone's head.

Cortex Analyst converts natural language into SQL. To do that reliably, it needs a translation layer between the two — one that maps what users actually ask ("What was our revenue last quarter?") to what the schema actually contains. That translation layer is the semantic model.

Schema vs. semantic model
Schema only

Tables, columns, types, and keys. Tells the engine what rows are valid. Tells a user nothing about what a column means, how it's measured, or what synonyms exist for it.

Schema + semantic model

Adds measures, dimensions, synonyms, business definitions, and verified query examples. Tells Cortex Analyst what a column means in business terms and how users actually refer to it.

Three layers of context Cortex Analyst needs

Structural context — which tables and columns exist. This is your schema, already in Snowflake. Cortex reads it automatically.

Semantic context — what those tables and columns mean. What is a measure versus a dimension. What calculations define derived values. What filters are meaningful. This lives in your YAML file.

Linguistic context — how humans actually talk about the data. Synonyms for column names, example questions paired with verified SQL, business-specific terms. This is the single highest-impact layer for accuracy.

The accuracy principle

Cortex Analyst accuracy is a direct function of the quality and completeness of these three layers. A minimal semantic model will produce approximately 60 percent accuracy on production queries. A well-authored one with synonyms and verified queries routinely clears 92 percent. This is the difference between a demo and a system people actually use.

02
Chapter 02
Anatomy of a YAML semantic model

A Cortex Analyst semantic model is a YAML file with a specific structure. Below is a minimal working example against a fictional orders table. Each block has a job to do — understanding what each block contributes is the core skill of this module.

# semantic_model.yaml — orders domain name: orders_model description: Order and revenue analytics for commercial customers tables: - name: orders description: One row per customer order, all channels base_table: database: ANALYTICS_PROD schema: SALES table: FACT_ORDERS dimensions: - name: region expr: region_code synonyms: [territory, market, geography, area] description: Sales region code, 2-letter ISO - name: product_category expr: product_cat synonyms: [category, product line, product type] time_dimensions: - name: order_date expr: order_dt synonyms: [date ordered, purchase date, when bought] measures: - name: total_revenue expr: SUM(order_total_usd) synonyms: [revenue, sales, total sales, top line] description: Sum of order total in USD, pre-discount - name: order_count expr: COUNT(DISTINCT order_id) synonyms: [orders, number of orders, transactions] filters: - name: last_quarter expr: order_dt >= DATEADD(quarter, -1, CURRENT_DATE()) verified_queries: - name: revenue_by_region_ltq question: What was revenue by region last quarter? sql: SELECT region_code, SUM(order_total_usd) FROM FACT_ORDERS WHERE order_dt >= DATEADD(quarter, -1, CURRENT_DATE()) GROUP BY region_code
What each block does

Dimensions are how users slice data — region, product, customer. Anything that answers "by what?" is a dimension.

Measures are what gets aggregated — revenue, count, average. Anything that answers "how much?" or "how many?" is a measure. Define them with explicit SQL expressions so there is no ambiguity.

Synonyms are the single highest-leverage element in the whole file. They map business language to your columns. Write them from the perspective of the user, not the database — "top line" maps to revenue, "market" maps to region.

Verified queries are worked examples that pair a natural language question with the correct SQL. They teach Cortex Analyst how your specific business asks specific questions. Ten well-chosen verified queries will outperform a hundred generic synonyms.

03
Chapter 03
The five-step build
Step 01
Prepare your workspace
Create a dedicated schema for semantic models and a stage to hold the YAML files. Use a dev role with read access to the target tables — never build directly against production.
CREATE SCHEMA IF NOT EXISTS ANALYST_DEV.SEMANTIC_MODELS; CREATE STAGE IF NOT EXISTS ANALYST_DEV.SEMANTIC_MODELS.YAML_STAGE; GRANT USAGE ON SCHEMA ANALYST_DEV.SEMANTIC_MODELS TO ROLE ANALYST_BUILDER;
Step 02
Identify tables, measures, and dimensions
Pick one fact table and its supporting dimensions. List the measures (what gets aggregated) and dimensions (how it gets sliced). Resist modeling everything at once — start narrow, expand after you achieve accuracy.
-- Fact: FACT_ORDERS -- Measures: total_revenue, order_count, avg_order_value -- Dimensions: region, product_category, customer_segment -- Time dimension: order_date
Step 03
Author the YAML semantic model
Write the YAML using the anatomy from Chapter 02 as your reference. Cortex Code CLI can scaffold a starter file for you — but every synonym and every verified query should be reviewed by a human who knows the business.
cortex analyst generate-model \ --table ANALYTICS_PROD.SALES.FACT_ORDERS \ --output ./semantic_model.yaml
Step 04
Deploy and test with natural language
Upload the YAML to your stage and register it with Cortex Analyst. Test with 15 to 20 real questions written by real users — not rephrased to match the schema. This is the only honest accuracy signal you get.
PUT file://semantic_model.yaml @ANALYST_DEV.SEMANTIC_MODELS.YAML_STAGE AUTO_COMPRESS=FALSE; -- Test: "What was revenue by region last quarter?"
Step 05
Measure accuracy and iterate
Track accuracy against a fixed test battery. For each failure, add a verified query and relevant synonyms, then re-test. A production-ready model sits above 90 percent accuracy on the user question set, not just yours.
Operator Tip
Verified queries are not optional. A semantic model without 10+ verified queries covering your highest-frequency questions will fail unpredictably. Treat them as part of the build, not a post-launch addition.
04
Chapter 04
Diagnosing accuracy failures

When Cortex Analyst produces wrong SQL, the failure almost always traces to one of six root causes. A structured diagnostic matrix saves hours of trial-and-error iteration.

SymptomRoot cause and fix
Query returns empty result but question was valid Dimension value mismatch. User said "northeast" but column values are "NE". Add a synonym mapping or a computed column with readable values.
Wrong column selected as the measure Ambiguous synonym. "Sales" could mean revenue or unit count. Clarify the synonym in the YAML and add a verified query disambiguating the user's most common phrasing.
Correct columns but wrong aggregation Measure expression missing or ambiguous. Every measure must have an explicit expr like SUM, COUNT, or AVG. Never let Cortex infer the aggregation.
Time filter not applied when user said "last quarter" Time dimension not declared, or no filter definition for the phrase. Add a named filter for common time ranges and reference it in verified queries.
Correct SQL but returns zero rows RBAC on the underlying table. Cortex Analyst inherits row access policies. Confirm the calling role has access to the rows — this is working as designed.
Wrong table selected across related fact tables Table description too generic. Rewrite the description field to explicitly state what distinguishes this fact table from others that might seem similar.

The iteration loop

Run the test battery. For each failure, identify the root cause from the matrix above. Add the specific fix — synonym, verified query, filter, or description. Re-run the battery. Track accuracy week over week. Stop iterating when you clear 90 percent and the remaining failures are edge cases acceptable to your business stakeholders.

05
Chapter 05
Governance and cost controls

A semantic model that works is not the same as a semantic model ready for production. Before you flip it on for end users, put four governance controls in place. Every one of these is non-optional in a regulated environment.

Four production controls
Cost benchmark — what to expect
~0.01
Credits per query
~2s
Median latency
92%+
Target accuracy

These are reference benchmarks for a well-authored semantic model on a small warehouse. Credit consumption varies with question complexity, output length, and model selection. Latency excludes warehouse cold-start. Accuracy depends on verified query coverage — models with fewer than 10 verified queries routinely fall below 75 percent.

Operator Tip
Audit the semantic model itself as a versioned artifact. Store the YAML in Git. Every production change goes through pull request review. Treating semantic models as code — not configuration — is what separates teams that scale from teams that don't.
06
Chapter 06
Verified queries in depth

If you only do one thing well in this module, do verified queries well. Synonyms move accuracy from 60 to 80 percent. Verified queries are what move it from 80 to 92 plus. Treat them as a first-class engineering artifact, not a post-launch addition.

What a verified query actually does

A verified query teaches Cortex Analyst the SQL pattern for a class of question. When a user asks something semantically similar — same intent, different wording — the model has a high-confidence template to work from. One well-chosen verified query covers a range of phrasings. Five verified queries can cover the bulk of a stakeholder's daily reporting needs.

A worked example
# A complete verified query — note the explicit SQL verified_queries: - name: top_products_last_quarter question: What were our top 5 products by revenue last quarter? use_as_onboarding_question: true sql: | SELECT product_name, SUM(order_total_usd) AS revenue FROM ANALYTICS_PROD.SALES.FACT_ORDERS o JOIN ANALYTICS_PROD.SALES.DIM_PRODUCT p ON o.product_id = p.product_id WHERE o.order_dt >= DATEADD(quarter, -1, CURRENT_DATE()) AND o.order_dt < DATE_TRUNC(quarter, CURRENT_DATE()) GROUP BY product_name ORDER BY revenue DESC LIMIT 5
How many verified queries do you need?
10
Minimum for production
25
Stable accuracy zone
50+
Mature semantic model

Below 10 verified queries, accuracy fluctuates with phrasing changes. Between 10 and 25, the model handles the most common reporting needs reliably. Above 50, you have a production-grade asset that can absorb new question types without retraining.

Operator Tip
Mine your existing BI workload for verified query candidates. Pull the 25 most-run dashboard queries from QUERY_HISTORY, identify the natural language equivalent each one answers, and use those pairs as your verified query seed set. This is the fastest way to a high-accuracy semantic model.
07
Chapter 07
Hands-on lab worksheet

A guided exercise to build, deploy, and test your first semantic model. Work through each task in order. Record your answers and outputs in the spaces provided. Estimated time: 60 to 90 minutes against a sandbox Snowflake account.

Task 01
Identify your fact table and three dimensions
10 min
Pick a fact table you know well and list the dimensions you would slice it by. Think about how your team actually asks questions — region, time, product, customer segment.
Task 02
Define three measures with explicit aggregations
10 min
Write the SQL expression for each measure. SUM, COUNT, AVG, or a derived calculation. Be explicit — never leave the aggregation up to inference.
Task 03
Add five synonyms per dimension and per measure
15 min
Write synonyms from the user's perspective, not the database's. "Top line" for revenue. "Market" for region. Test them against three real questions a stakeholder has asked recently.
Task 04
Author three verified queries for high-frequency questions
20 min
Take three of the most common reporting questions in your domain. Write each as a natural language question paired with its known-correct SQL. These three are your accuracy floor.
Task 05
Deploy and test against a 10-question battery
25 min
Upload the YAML to your stage. Test 10 questions written by a real user, not by you. Record accuracy.
08
Chapter 08
Knowledge check

Five questions covering the core concepts of this module. Answer key appears on the closing page. A passing score is four of five correct.

Question 01
Which element of a semantic model has the highest leverage on accuracy?
Question 02
Cortex Analyst inherits which security model when executing a query?
Question 03
A test query returns the wrong column as the measure. The most likely cause is:
Question 04
What is the minimum recommended number of verified queries for a production semantic model?
Question 05
Which of the following is NOT a valid governance control for a Cortex Analyst deployment?
09
Chapter 09
Glossary and reference
Key terminology
Semantic model
A YAML file mapping database objects to business-meaningful names, measures, and synonyms. The bridge between natural language and SQL.
Measure
An aggregatable value with an explicit SQL expression. Revenue, count, margin, average. Answers "how much?"
Dimension
A sliceable attribute. Region, product, date, customer segment. Answers "by what?"
Verified query
A natural language question paired with its known-correct SQL. The single most effective tool for raising accuracy.
Synonym
An alternate business term mapped to a model element. "Top line" for revenue. "Market" for region.
Filter
A named, reusable WHERE clause expression. Examples: last_quarter, current_fy, active_customers.
Time dimension
A specialized dimension representing dates or timestamps. Enables relative time queries like "last quarter" or "year over year."
YAML cheat sheet
Dimension
- name: region expr: region_code synonyms: [territory, market] description: Sales region
Measure
- name: revenue expr: SUM(order_total) synonyms: [sales, top line]
Time dimension
- name: order_date expr: order_dt synonyms: [date, when ordered]
Verified query
- name: rev_by_region question: Revenue by region? sql: SELECT region, SUM(...)
Production readiness checklist
10
Chapter 10
Module completion
What you learned

Concept

Why a schema alone fails natural language and the three layers of context — structural, semantic, linguistic — that Cortex Analyst requires for production-grade accuracy.

Construction

How to author a YAML semantic model with explicit measures, dimensions, synonyms, and verified queries. The five-step build pattern from workspace prep through accuracy iteration.

Diagnostics

The six common failure modes and the structured matrix for mapping each symptom to its root cause and specific fix. The iteration loop for stable accuracy.

Governance

Four production controls — RBAC, audit logging, credit limits, warehouse sizing — required before opening Cortex Analyst access to end users.

Cloud On Demand · Operator Module Completion
Cortex Analyst
Semantic Models
Awarded to the practitioner who completed module OPR-004
Recipient signature
Issued by Cloud On Demand · OPR-004 · v1.0
Next in series
OPR-005 · SnowWork Agent Design