Data model
When Pulsy answers a performance question, it isn’t querying raw tables. It goes
through a semantic layer: one Snowflake Cortex Agent (AMAZON_ADS_AGENT,
running claude-sonnet-4-6) that routes the question to one of six domain
analysts, each backed by a curated semantic view. Understanding those six is
how you predict what Pulsy can and can’t answer.
The schemas behind it
Section titled “The schemas behind it”Everything lives in the Snowflake database AMAZON_MARKETING:
| Schema | What it holds |
|---|---|
PULSEAD | The main source layer — 218 tables of ingested Amazon Ads data. |
PULSY | Gold/reference layer — 65 tables (TaCoS, hourly, Brand Analytics). Reference only; a schema, not the product. |
SEMANTIC_LAYER | The agent-facing layer: the six views + the Cortex Agent + helper views. |
VALIDATION | QA systems (VADER, AQUA) that gate semantic-view quality. |
The six analysts
Section titled “The six analysts”One agent fronts six Cortex Analyst tools (plus a data_to_chart visualizer) and
routes by intent. Each analyst maps to one semantic view:
| Analyst | View | Covers | Key gotcha |
|---|---|---|---|
| SponsoredAds | SPONSORED_ADS_PERFORMANCE | SP/SB/SD performance in one place — ROAS, ACOS, CPC, CTR, sales, CVR. | Requires a PROFILE_ID filter; SP/SB/SD are combined by UNION ALL, never joined. |
| SearchTerm | SEARCH_TERM_V4 | Customer search terms vs. your keywords, plus marketplace Brand Analytics search frequency. | Brand Analytics has no PROFILE_ID and a weekly/monthly grain — must not be joined to daily SP/SB data. |
| Product | PRODUCT_INSIGHT | ASIN-level performance, cross-sell (advertised vs. purchased ASIN), BSR. | Product metadata is deduped to one row/ASIN to prevent fan-out. |
| Budget | BUDGET_MANAGEMENT | Burn rate, missed sales/clicks, Amazon’s recommended budgets. | Filter BUDGET_SCOPE_TYPE to a single value (campaign or portfolio) or you double-count. |
| DSP | DSP_PERFORMANCE | Programmatic DSP creative/line-item performance. | Keyed on ADVERTISER_ID, not PROFILE_ID; cost is in millicents (÷100,000 → USD); KPIs differ (DPVR, ATC). |
| NTB | BRAND_NTB | New-to-brand acquisition across SB/SD/DSP. | DSP NTB ≠ SB/SD NTB (Amazon-wide vs. brand-specific) — don’t compare directly. SP has no NTB data. |
The pattern to carry: Sponsored Ads and DSP are different worlds. Sponsored Ads is profile-scoped, dollar-denominated, and combined across SP/SB/SD; DSP is advertiser-scoped, millicent-denominated, and has its own KPIs. A question that crosses them needs two analysts, not one.
How current the data is
Section titled “How current the data is”The semantic layer is refreshed nightly at 08:00 KST (an AWS EventBridge
schedule, cron(0 8 * * ? *), Asia/Seoul). That governs the semantic assets —
the views and their metadata. The freshness of the underlying PULSEAD source
tables is governed separately by Amazon Ads ingestion, not by this refresh. So
“the data is a day old” is the safe default assumption for semantic-layer answers.
Where the definitions live (SSOT)
Section titled “Where the definitions live (SSOT)”Two repos own this layer, and they’re two generations of the same capability:
data_semanticviewis the source of truth for what exists — the deployed view DDL (02_develop/semantic_views/deploy/*.sql) is “the only version executed in Snowflake,” plus the analyst routing and the VADER/AQUA validation.snowflake-semantic-agentis the newer automated refresh runtime that regenerates and publishes the semantic assets (the control / knowledge / execution planes). It shares theSEMANTIC_LAYERschema.
Don’t describe it as a clean “runtime consumes definitions” split — both can
produce SEMANTIC_LAYER assets; the manual layer is being superseded by the
automated one.