Skip to content

Data model

When Pulsy answers a performance question, it isn’t querying raw tables. It goes through a semantic layer: one Snowflake Cortex Agent (AMAZON_ADS_AGENT, running claude-sonnet-4-6) that routes the question to one of six domain analysts, each backed by a curated semantic view. Understanding those six is how you predict what Pulsy can and can’t answer.

Everything lives in the Snowflake database AMAZON_MARKETING:

SchemaWhat it holds
PULSEADThe main source layer — 218 tables of ingested Amazon Ads data.
PULSYGold/reference layer — 65 tables (TaCoS, hourly, Brand Analytics). Reference only; a schema, not the product.
SEMANTIC_LAYERThe agent-facing layer: the six views + the Cortex Agent + helper views.
VALIDATIONQA systems (VADER, AQUA) that gate semantic-view quality.

One agent fronts six Cortex Analyst tools (plus a data_to_chart visualizer) and routes by intent. Each analyst maps to one semantic view:

AnalystViewCoversKey gotcha
SponsoredAdsSPONSORED_ADS_PERFORMANCESP/SB/SD performance in one place — ROAS, ACOS, CPC, CTR, sales, CVR.Requires a PROFILE_ID filter; SP/SB/SD are combined by UNION ALL, never joined.
SearchTermSEARCH_TERM_V4Customer search terms vs. your keywords, plus marketplace Brand Analytics search frequency.Brand Analytics has no PROFILE_ID and a weekly/monthly grain — must not be joined to daily SP/SB data.
ProductPRODUCT_INSIGHTASIN-level performance, cross-sell (advertised vs. purchased ASIN), BSR.Product metadata is deduped to one row/ASIN to prevent fan-out.
BudgetBUDGET_MANAGEMENTBurn rate, missed sales/clicks, Amazon’s recommended budgets.Filter BUDGET_SCOPE_TYPE to a single value (campaign or portfolio) or you double-count.
DSPDSP_PERFORMANCEProgrammatic DSP creative/line-item performance.Keyed on ADVERTISER_ID, not PROFILE_ID; cost is in millicents (÷100,000 → USD); KPIs differ (DPVR, ATC).
NTBBRAND_NTBNew-to-brand acquisition across SB/SD/DSP.DSP NTB ≠ SB/SD NTB (Amazon-wide vs. brand-specific) — don’t compare directly. SP has no NTB data.

The pattern to carry: Sponsored Ads and DSP are different worlds. Sponsored Ads is profile-scoped, dollar-denominated, and combined across SP/SB/SD; DSP is advertiser-scoped, millicent-denominated, and has its own KPIs. A question that crosses them needs two analysts, not one.

The semantic layer is refreshed nightly at 08:00 KST (an AWS EventBridge schedule, cron(0 8 * * ? *), Asia/Seoul). That governs the semantic assets — the views and their metadata. The freshness of the underlying PULSEAD source tables is governed separately by Amazon Ads ingestion, not by this refresh. So “the data is a day old” is the safe default assumption for semantic-layer answers.

Two repos own this layer, and they’re two generations of the same capability:

  • data_semanticview is the source of truth for what exists — the deployed view DDL (02_develop/semantic_views/deploy/*.sql) is “the only version executed in Snowflake,” plus the analyst routing and the VADER/AQUA validation.
  • snowflake-semantic-agent is the newer automated refresh runtime that regenerates and publishes the semantic assets (the control / knowledge / execution planes). It shares the SEMANTIC_LAYER schema.

Don’t describe it as a clean “runtime consumes definitions” split — both can produce SEMANTIC_LAYER assets; the manual layer is being superseded by the automated one.