Operator Module · Tier 2
Cortex Analyst
Semantic Models
Design, author, and deploy a production-ready semantic model for Cortex Analyst. Built for data engineers and analytics engineers already running Snowflake — not a getting-started tutorial.
Learning objectives
01
Understand why a semantic model exists and what it does that a schema alone cannot.
02
Author a YAML semantic model for a real table including measures, dimensions, and filters.
03
Deploy to Cortex Analyst and test accuracy against natural language queries.
04
Diagnose and remediate accuracy failures using verified queries and synonyms.
05
Implement governance controls — RBAC, logging, and cost containment.
06
Measure accuracy and iterate using a structured testing methodology.
Duration
90 minutes
plus lab
Prerequisites
Snowflake account
Cortex enabled
Why this matters
Cortex Analyst accuracy collapses without a well-authored semantic model. Teams that skip the semantic layer and point Cortex at raw tables get demo-quality results — not production-grade natural language querying. This module closes that gap.
00
Module Contents
What this module covers
This module is structured as a complete build — context, concept, construction, deployment, validation, and governance. Every chapter produces a tangible artifact. By the end you will have a deployed, tested, production-ready semantic model.
-
01
The semantic model concept
Why schemas alone fail natural language. The three layers of business context that Cortex Analyst requires to produce reliable SQL.
p.03
-
02
Anatomy of a YAML semantic model
Tables, dimensions, measures, synonyms, verified queries, filters. What each block does and how they interact.
p.04
-
03
The five-step build
Hands-on lab. Prepare the workspace, identify entities, author the YAML, deploy, and test.
p.05
-
04
Diagnosing accuracy failures
Why natural language queries fail. A structured troubleshooting matrix for the six most common failure modes.
p.06
-
05
Governance and cost controls
RBAC, audit logging, credit limits, and token budgets. Operator-grade controls before going to production.
p.07
-
06
Verified queries in depth
Why verified queries are the single most effective accuracy lever. How to author them, how many to write, and how to maintain them.
p.08
-
07
Hands-on lab worksheet
A guided exercise to build, deploy, and test a semantic model against a sample dataset. Templates and checkpoints included.
p.09
-
08
Knowledge check
Five assessment questions covering concepts, construction, governance, and troubleshooting. Answer key on the closing page.
p.10
-
09
Glossary and reference
Key terminology, syntax cheat sheet, and a checklist for production readiness.
p.11
-
10
Module completion and next steps
Module summary, what you learned, and the next module in the Operator series.
p.12
01
Chapter 01
The semantic model concept
A Snowflake schema tells the database engine what columns exist and how they are typed. It does not tell anything — not a human, not a language model — what those columns mean in business terms. A column named rev_m_ttm is a database fact. Whether it represents "trailing twelve month revenue in millions" is a business fact that lives only in someone's head.
Cortex Analyst converts natural language into SQL. To do that reliably, it needs a translation layer between the two — one that maps what users actually ask ("What was our revenue last quarter?") to what the schema actually contains. That translation layer is the semantic model.
Schema vs. semantic model
Schema only
Tables, columns, types, and keys. Tells the engine what rows are valid. Tells a user nothing about what a column means, how it's measured, or what synonyms exist for it.
Schema + semantic model
Adds measures, dimensions, synonyms, business definitions, and verified query examples. Tells Cortex Analyst what a column means in business terms and how users actually refer to it.
Three layers of context Cortex Analyst needs
Structural context — which tables and columns exist. This is your schema, already in Snowflake. Cortex reads it automatically.
Semantic context — what those tables and columns mean. What is a measure versus a dimension. What calculations define derived values. What filters are meaningful. This lives in your YAML file.
Linguistic context — how humans actually talk about the data. Synonyms for column names, example questions paired with verified SQL, business-specific terms. This is the single highest-impact layer for accuracy.
The accuracy principle
Cortex Analyst accuracy is a direct function of the quality and completeness of these three layers. A minimal semantic model will produce approximately 60 percent accuracy on production queries. A well-authored one with synonyms and verified queries routinely clears 92 percent. This is the difference between a demo and a system people actually use.
02
Chapter 02
Anatomy of a YAML semantic model
A Cortex Analyst semantic model is a YAML file with a specific structure. Below is a minimal working example against a fictional orders table. Each block has a job to do — understanding what each block contributes is the core skill of this module.
# semantic_model.yaml — orders domain
name: orders_model
description: Order and revenue analytics for commercial customers
tables:
- name: orders
description: One row per customer order, all channels
base_table:
database: ANALYTICS_PROD
schema: SALES
table: FACT_ORDERS
dimensions:
- name: region
expr: region_code
synonyms: [territory, market, geography, area]
description: Sales region code, 2-letter ISO
- name: product_category
expr: product_cat
synonyms: [category, product line, product type]
time_dimensions:
- name: order_date
expr: order_dt
synonyms: [date ordered, purchase date, when bought]
measures:
- name: total_revenue
expr: SUM(order_total_usd)
synonyms: [revenue, sales, total sales, top line]
description: Sum of order total in USD, pre-discount
- name: order_count
expr: COUNT(DISTINCT order_id)
synonyms: [orders, number of orders, transactions]
filters:
- name: last_quarter
expr: order_dt >= DATEADD(quarter, -1, CURRENT_DATE())
verified_queries:
- name: revenue_by_region_ltq
question: What was revenue by region last quarter?
sql: SELECT region_code, SUM(order_total_usd) FROM FACT_ORDERS
WHERE order_dt >= DATEADD(quarter, -1, CURRENT_DATE())
GROUP BY region_code
What each block does
Dimensions are how users slice data — region, product, customer. Anything that answers "by what?" is a dimension.
Measures are what gets aggregated — revenue, count, average. Anything that answers "how much?" or "how many?" is a measure. Define them with explicit SQL expressions so there is no ambiguity.
Synonyms are the single highest-leverage element in the whole file. They map business language to your columns. Write them from the perspective of the user, not the database — "top line" maps to revenue, "market" maps to region.
Verified queries are worked examples that pair a natural language question with the correct SQL. They teach Cortex Analyst how your specific business asks specific questions. Ten well-chosen verified queries will outperform a hundred generic synonyms.
03
Chapter 03
The five-step build
Step 01
Prepare your workspace
Create a dedicated schema for semantic models and a stage to hold the YAML files. Use a dev role with read access to the target tables — never build directly against production.
CREATE SCHEMA IF NOT EXISTS ANALYST_DEV.SEMANTIC_MODELS;
CREATE STAGE IF NOT EXISTS ANALYST_DEV.SEMANTIC_MODELS.YAML_STAGE;
GRANT USAGE ON SCHEMA ANALYST_DEV.SEMANTIC_MODELS TO ROLE ANALYST_BUILDER;
Step 02
Identify tables, measures, and dimensions
Pick one fact table and its supporting dimensions. List the measures (what gets aggregated) and dimensions (how it gets sliced). Resist modeling everything at once — start narrow, expand after you achieve accuracy.
-- Fact: FACT_ORDERS
-- Measures: total_revenue, order_count, avg_order_value
-- Dimensions: region, product_category, customer_segment
-- Time dimension: order_date
Step 03
Author the YAML semantic model
Write the YAML using the anatomy from Chapter 02 as your reference. Cortex Code CLI can scaffold a starter file for you — but every synonym and every verified query should be reviewed by a human who knows the business.
cortex analyst generate-model \
--table ANALYTICS_PROD.SALES.FACT_ORDERS \
--output ./semantic_model.yaml
Step 04
Deploy and test with natural language
Upload the YAML to your stage and register it with Cortex Analyst. Test with 15 to 20 real questions written by real users — not rephrased to match the schema. This is the only honest accuracy signal you get.
PUT file://semantic_model.yaml
@ANALYST_DEV.SEMANTIC_MODELS.YAML_STAGE AUTO_COMPRESS=FALSE;
-- Test: "What was revenue by region last quarter?"
Step 05
Measure accuracy and iterate
Track accuracy against a fixed test battery. For each failure, add a verified query and relevant synonyms, then re-test. A production-ready model sits above 90 percent accuracy on the user question set, not just yours.
Operator Tip
Verified queries are not optional. A semantic model without 10+ verified queries covering your highest-frequency questions will fail unpredictably. Treat them as part of the build, not a post-launch addition.
04
Chapter 04
Diagnosing accuracy failures
When Cortex Analyst produces wrong SQL, the failure almost always traces to one of six root causes. A structured diagnostic matrix saves hours of trial-and-error iteration.
| Symptom | Root cause and fix |
| Query returns empty result but question was valid |
Dimension value mismatch. User said "northeast" but column values are "NE". Add a synonym mapping or a computed column with readable values. |
| Wrong column selected as the measure |
Ambiguous synonym. "Sales" could mean revenue or unit count. Clarify the synonym in the YAML and add a verified query disambiguating the user's most common phrasing. |
| Correct columns but wrong aggregation |
Measure expression missing or ambiguous. Every measure must have an explicit expr like SUM, COUNT, or AVG. Never let Cortex infer the aggregation. |
| Time filter not applied when user said "last quarter" |
Time dimension not declared, or no filter definition for the phrase. Add a named filter for common time ranges and reference it in verified queries. |
| Correct SQL but returns zero rows |
RBAC on the underlying table. Cortex Analyst inherits row access policies. Confirm the calling role has access to the rows — this is working as designed. |
| Wrong table selected across related fact tables |
Table description too generic. Rewrite the description field to explicitly state what distinguishes this fact table from others that might seem similar. |
The iteration loop
Run the test battery. For each failure, identify the root cause from the matrix above. Add the specific fix — synonym, verified query, filter, or description. Re-run the battery. Track accuracy week over week. Stop iterating when you clear 90 percent and the remaining failures are edge cases acceptable to your business stakeholders.
05
Chapter 05
Governance and cost controls
A semantic model that works is not the same as a semantic model ready for production. Before you flip it on for end users, put four governance controls in place. Every one of these is non-optional in a regulated environment.
Four production controls
RBAC inheritance confirmed. Cortex Analyst runs under the calling user's role. Verify that row access policies, column masking, and data exclusions are enforced by running test queries as a restricted role.
Audit logging active. Every Cortex Analyst query should land in ACCOUNT_USAGE.QUERY_HISTORY with the invoking role. Confirm before going live and set an alert on anomalous usage.
Per-user credit limits set. Use the CORTEX_CODE_SNOWSIGHT_DAILY_EST_CREDIT_LIMIT_PER_USER parameter to cap token spend. Default to 20 credits per user per day and raise only for verified power users.
Warehouse sizing validated. Cortex Analyst executes SQL on your virtual warehouses. Run the test battery against the target warehouse size and confirm response times hit your SLA before opening access.
Cost benchmark — what to expect
These are reference benchmarks for a well-authored semantic model on a small warehouse. Credit consumption varies with question complexity, output length, and model selection. Latency excludes warehouse cold-start. Accuracy depends on verified query coverage — models with fewer than 10 verified queries routinely fall below 75 percent.
Operator Tip
Audit the semantic model itself as a versioned artifact. Store the YAML in Git. Every production change goes through pull request review. Treating semantic models as code — not configuration — is what separates teams that scale from teams that don't.
06
Chapter 06
Verified queries in depth
If you only do one thing well in this module, do verified queries well. Synonyms move accuracy from 60 to 80 percent. Verified queries are what move it from 80 to 92 plus. Treat them as a first-class engineering artifact, not a post-launch addition.
What a verified query actually does
A verified query teaches Cortex Analyst the SQL pattern for a class of question. When a user asks something semantically similar — same intent, different wording — the model has a high-confidence template to work from. One well-chosen verified query covers a range of phrasings. Five verified queries can cover the bulk of a stakeholder's daily reporting needs.
A worked example
# A complete verified query — note the explicit SQL
verified_queries:
- name: top_products_last_quarter
question: What were our top 5 products by revenue last quarter?
use_as_onboarding_question: true
sql: |
SELECT
product_name,
SUM(order_total_usd) AS revenue
FROM ANALYTICS_PROD.SALES.FACT_ORDERS o
JOIN ANALYTICS_PROD.SALES.DIM_PRODUCT p
ON o.product_id = p.product_id
WHERE o.order_dt >= DATEADD(quarter, -1, CURRENT_DATE())
AND o.order_dt < DATE_TRUNC(quarter, CURRENT_DATE())
GROUP BY product_name
ORDER BY revenue DESC
LIMIT 5
How many verified queries do you need?
Below 10 verified queries, accuracy fluctuates with phrasing changes. Between 10 and 25, the model handles the most common reporting needs reliably. Above 50, you have a production-grade asset that can absorb new question types without retraining.
Operator Tip
Mine your existing BI workload for verified query candidates. Pull the 25 most-run dashboard queries from QUERY_HISTORY, identify the natural language equivalent each one answers, and use those pairs as your verified query seed set. This is the fastest way to a high-accuracy semantic model.
07
Chapter 07
Hands-on lab worksheet
A guided exercise to build, deploy, and test your first semantic model. Work through each task in order. Record your answers and outputs in the spaces provided. Estimated time: 60 to 90 minutes against a sandbox Snowflake account.
Task 01
Identify your fact table and three dimensions
10 min
Pick a fact table you know well and list the dimensions you would slice it by. Think about how your team actually asks questions — region, time, product, customer segment.
Task 02
Define three measures with explicit aggregations
10 min
Write the SQL expression for each measure. SUM, COUNT, AVG, or a derived calculation. Be explicit — never leave the aggregation up to inference.
Task 03
Add five synonyms per dimension and per measure
15 min
Write synonyms from the user's perspective, not the database's. "Top line" for revenue. "Market" for region. Test them against three real questions a stakeholder has asked recently.
Task 04
Author three verified queries for high-frequency questions
20 min
Take three of the most common reporting questions in your domain. Write each as a natural language question paired with its known-correct SQL. These three are your accuracy floor.
Task 05
Deploy and test against a 10-question battery
25 min
Upload the YAML to your stage. Test 10 questions written by a real user, not by you. Record accuracy.
08
Chapter 08
Knowledge check
Five questions covering the core concepts of this module. Answer key appears on the closing page. A passing score is four of five correct.
Question 01
Which element of a semantic model has the highest leverage on accuracy?
Question 02
Cortex Analyst inherits which security model when executing a query?
A
The semantic model author's role
B
The calling user's role, including row access and masking
C
A dedicated Cortex service role
D
The ACCOUNTADMIN role
Question 03
A test query returns the wrong column as the measure. The most likely cause is:
A
Warehouse undersized
B
Ambiguous synonym mapping to multiple measures
C
Missing column in the base table
D
Incorrect YAML indentation
Question 04
What is the minimum recommended number of verified queries for a production semantic model?
Question 05
Which of the following is NOT a valid governance control for a Cortex Analyst deployment?
A
Per-user daily credit limits
B
Column-level masking policies
C
Disabling the calling user's role inheritance
D
Audit logging via QUERY_HISTORY
09
Chapter 09
Glossary and reference
Key terminology
Semantic model
A YAML file mapping database objects to business-meaningful names, measures, and synonyms. The bridge between natural language and SQL.
Measure
An aggregatable value with an explicit SQL expression. Revenue, count, margin, average. Answers "how much?"
Dimension
A sliceable attribute. Region, product, date, customer segment. Answers "by what?"
Verified query
A natural language question paired with its known-correct SQL. The single most effective tool for raising accuracy.
Synonym
An alternate business term mapped to a model element. "Top line" for revenue. "Market" for region.
Filter
A named, reusable WHERE clause expression. Examples: last_quarter, current_fy, active_customers.
Time dimension
A specialized dimension representing dates or timestamps. Enables relative time queries like "last quarter" or "year over year."
YAML cheat sheet
Dimension
- name: region
expr: region_code
synonyms: [territory, market]
description: Sales region
Measure
- name: revenue
expr: SUM(order_total)
synonyms: [sales, top line]
Time dimension
- name: order_date
expr: order_dt
synonyms: [date, when ordered]
Verified query
- name: rev_by_region
question: Revenue by region?
sql: SELECT region, SUM(...)
Production readiness checklist
Semantic model authored with at least 10 verified queries covering top reporting questions
Synonyms validated against real user phrasing — not the team that built the model
RBAC inheritance tested with a restricted role to confirm row and column policies enforce
Per-user daily credit limit set, default 20 credits per user
QUERY_HISTORY audit logging confirmed active and alerting configured
Accuracy test battery achieves 90%+ against a stakeholder-authored question set
YAML stored in Git with PR review process for production changes
10
Chapter 10
Module completion
What you learned
Concept
Why a schema alone fails natural language and the three layers of context — structural, semantic, linguistic — that Cortex Analyst requires for production-grade accuracy.
Construction
How to author a YAML semantic model with explicit measures, dimensions, synonyms, and verified queries. The five-step build pattern from workspace prep through accuracy iteration.
Diagnostics
The six common failure modes and the structured matrix for mapping each symptom to its root cause and specific fix. The iteration loop for stable accuracy.
Governance
Four production controls — RBAC, audit logging, credit limits, warehouse sizing — required before opening Cortex Analyst access to end users.
Cloud On Demand · Operator Module Completion
Cortex Analyst
Semantic Models
Awarded to the practitioner who completed module OPR-004
Recipient signature
Issued by Cloud On Demand · OPR-004 · v1.0
Next in series
OPR-005 · SnowWork Agent Design
→