Content
{
"scorecard": [
{
"layer": "identity_resolution",
"status": "ready",
"current_state": "A robust 7-tier matching system in int_identity__resolution stitches PostHog persons to canonical user_ids across PostHog and the Kilocode backend. The three-step mapping chain (int_user_id__to_person_id \u2192 int_person_id__to_distinct_id \u2192 int_user_id__to_distinct_id) provides cross-session linking, and int_posthog__person_collapsed deduplicates multiple persons per user using MIN_BY for first-touch preservation.",
"target_state": "A complete identity layer would additionally incorporate a device graph using stg_backend__stytch_fingerprints to link anonymous mobile sessions to authenticated desktop signups, closing the cross-device attribution gap for mobile-first discovery channels.",
"evidence": "int_identity__resolution implements strong_id through unresolved_kilocode tiers with EXISTS and HAVING count(distinct)=1 patterns. canonical_users is consumed by 15+ downstream models. The mapping layer confirms identity stitching, cross-session linking, merged ID handling, and deduplication are all present. Only device graph is missing, and stg_backend__stytch_fingerprints exists in staging but is unused."
},
{
"layer": "signal_touchpoint",
"status": "partial",
"current_state": "The project has strong event staging (stg_posthog__events_filtered with dedup and bot detection), conversion event classification (int_conversion_events__posthog), milestone timestamp extraction (int_user__event_firsts), and a long-format touchpoint history (int_user_id__attribution_history). Synthetic retention signals are generated via int_conversion_events__synthetic_daily with power-curve conversion values.",
"target_state": "A complete signal layer would add dbt-native session detection using inactivity windows on stg_posthog__events_filtered, producing session-grain touchpoints with entry-level attribution that enables session-level deduplication and more accurate multi-touch analysis.",
"evidence": "The mapping rates event staging, touchpoint extraction, data quality filters, signal type classification, and temporal ordering as present. Session detection is the sole missing capability\u2014posthog_sessions is a raw passthrough with no dbt logic. int_user_id__attribution_history derives touchpoints from PostHog person property versions rather than individual event timestamps, limiting granularity."
},
{
"layer": "channel_classification",
"status": "partial",
"current_state": "Channel classification is well-implemented for paid channels via a consistent click ID hierarchy (gclid, fbclid, msclkid, rdt_cid, li_fat_id) with UTM fallback and an 'Organic/Direct' catch-all, applied uniformly across canonical_users and all five paid_budget conversion models. int_paid_budget__ad_spend_by_platform unifies spend across 6 ad platforms with product and campaign type classification.",
"target_state": "A complete classification layer would add referrer domain mapping (using $initial_referring_domain already captured in stg_posthog__person_identity) to distinguish organic search, organic social, and referral traffic from true direct visits, and optionally incorporate user-agent heuristics from stg_backend__http_user_agent for in-app browser detection.",
"evidence": "The mapping confirms UTM parsing, click ID detection, classification hierarchy, and fallback handling are present. Referrer classification and user-agent heuristics are missing. The gap analysis highlights that all non-paid traffic collapses into a single 'Organic/Direct' bucket, inflating direct and hiding organic/referral contributions."
},
{
"layer": "attribution_models",
"status": "partial",
"current_state": "The system implements first-touch attribution exclusively, applied consistently across all paid_budget conversion models with 90-day platform-specific staleness windows in int_user__attribution_resolved. Grain flexibility exists at user, organization, and subscription levels. int_user_id__attribution_history stores the raw multi-touch data in long format but no downstream model distributes credit across touchpoints.",
"target_state": "A complete attribution layer would add last-touch and multi-touch models (linear, time-decay, position-based) reading from int_user_id__attribution_history, a model comparison mart for side-by-side channel credit analysis, and integration of self-reported customer_source data from stg_backend__kilocode_users for survey-vs-behavioral comparison.",
"evidence": "The mapping confirms first-touch and grain flexibility are present. Last-touch, multi-touch, model comparison, and survey integration are all missing. int_user__attribution_resolved implements a first-touch-with-staleness-fallback pattern, not a true last-touch model. stg_backend__kilocode_users has regex-classified customer_source that is not consumed by any attribution model."
},
{
"layer": "coverage_health",
"status": "missing",
"current_state": "No models measure the reliability or completeness of the attribution system. paid_budget__monthly_performance and paid_budget__weekly_performance compute CAC and ROI but do not track what percentage of conversions have attributed touchpoints, click ID capture rates, or identity resolution quality over time.",
"target_state": "A complete coverage layer would include daily/weekly metrics for behavioural coverage rate, direct-only rate, per-platform click ID capture rates, identity resolution tier distribution trends, and XmR process control charts to detect statistically significant shifts in attribution quality.",
"evidence": "The mapping found zero models in this layer. The gap analysis notes that stg_posthog__events_filtered includes bot detection (a data quality filter) but no dedicated model tracks attribution pipeline health. Admin models (active_queries, columns) monitor Snowflake infrastructure, not attribution quality."
}
],
"roadmap": [
{
"action": "Build a referrer domain classification model (int_channel__referrer_classification) that maps $initial_referring_domain from stg_posthog__person_identity to channels (Organic Search, Organic Social, Referral, Direct) using a CASE WHEN hierarchy backed by a maintainable seed file, and integrate it into canonical_users' channel derivation as a tier between UTM and Direct",
"priority": "high",
"effort": "medium",
"impact": "high",
"layer": "channel_classification",
"prerequisites": []
},
{
"action": "Build a mart_attribution__coverage_daily model that calculates daily attribution health metrics: total conversions, conversions with at least one click ID or UTM, direct-only rate, per-platform click ID capture rates, and identity resolution tier distribution from int_identity__resolution, with a weekly rollup model that computes moving ranges and natural process limits for drift detection",
"priority": "high",
"effort": "medium",
"impact": "high",
"layer": "coverage_health",
"prerequisites": []
},
{
"action": "Build an int_attribution__last_touch model that reads int_user_id__attribution_history and selects the most recent touchpoint per user using ROW_NUMBER ordered by posthog_created_at DESC, applies the same click ID > UTM > referrer classification hierarchy, and create a parallel paid_budget__conversions_last_touch model for side-by-side reporting",
"priority": "high",
"effort": "medium",
"impact": "high",
"layer": "attribution_models",
"prerequisites": []
},
{
"action": "Build an int_sessions__detected model that sessionizes stg_posthog__events_filtered using LAG-based 30-minute inactivity gap detection, assigns deterministic session_ids, and enriches each session with entry-level attribution (first event's UTM, referrer, and click ID via FIRST_VALUE)",
"priority": "medium",
"effort": "medium",
"impact": "medium",
"layer": "signal_touchpoint",
"prerequisites": []
},
{
"action": "Build an int_attribution__survey_responses model that standardizes stg_backend__kilocode_users.customer_source into the same channel taxonomy as behavioral attribution, and a mart_attribution__survey_vs_behavioral model that produces a confusion-matrix comparison of self-reported vs first-touch channels at weekly grain",
"priority": "medium",
"effort": "low",
"impact": "medium",
"layer": "attribution_models",
"prerequisites": []
},
{
"action": "Build a mart_attribution__model_comparison model that joins first-touch conversions (from paid_budget__conversions) with last-touch conversions (from the new last-touch model) at (date, channel, product) grain, with a divergence_score highlighting channels where model choice materially impacts credited conversions",
"priority": "medium",
"effort": "low",
"impact": "medium",
"layer": "attribution_models",
"prerequisites": [
"Build an int_attribution__last_touch model that reads int_user_id__attribution_history and selects the most recent touchpoint per user using ROW_NUMBER ordered by posthog_created_at DESC, applies the same click ID > UTM > referrer classification hierarchy, and create a parallel paid_budget__conversions_last_touch model for side-by-side reporting"
]
},
{
"action": "Build an int_attribution__multi_touch model that reads int_user_id__attribution_history (or session-level touchpoints), calculates fractional credit under linear, time-decay, and position-based models at (user_id, touchpoint_id, model_type, credit) grain, and extend the model comparison mart to include multi-touch credit columns",
"priority": "medium",
"effort": "high",
"impact": "medium",
"layer": "attribution_models",
"prerequisites": [
"Build an int_attribution__last_touch model that reads int_user_id__attribution_history and selects the most recent touchpoint per user using ROW_NUMBER ordered by posthog_created_at DESC, applies the same click ID > UTM > referrer classification hierarchy, and create a parallel paid_budget__conversions_last_touch model for side-by-side reporting"
]
},
{
"action": "Build an int_identity__device_graph model that reads stg_backend__stytch_fingerprints, links fingerprint_id to user_id and distinct_id, bridges anonymous mobile sessions to authenticated desktop users (when a fingerprint maps to exactly one authenticated user), and integrate as a supplementary tier in int_identity__resolution",
"priority": "low",
"effort": "high",
"impact": "medium",
"layer": "identity_resolution",
"prerequisites": []
},
{
"action": "Build an int_channel__user_agent_enrichment model that parses stg_backend__http_user_agent to detect in-app browsers (FBAN/FBAV for Meta, LinkedIn), device type, and additional bot signatures, and use in-app browser flags as a supplementary signal in referrer classification for cases where click IDs and referrer are both absent",
"priority": "low",
"effort": "medium",
"impact": "low",
"layer": "channel_classification",
"prerequisites": [
"Build a referrer domain classification model (int_channel__referrer_classification) that maps $initial_referring_domain from stg_posthog__person_identity to channels (Organic Search, Organic Social, Referral, Direct) using a CASE WHEN hierarchy backed by a maintainable seed file, and integrate it into canonical_users' channel derivation as a tier between UTM and Direct"
]
}
],
"summary": {
"ready_count": 1,
"partial_count": 3,
"missing_count": 1,
"overall_assessment": "The project has a strong foundation: identity resolution is production-ready with a sophisticated 7-tier matching system, and the paid channel attribution pipeline (click ID capture \u2192 first-touch attribution \u2192 platform-specific conversion sync \u2192 CAC/ROI reporting) is well-built and consistently applied across all product lines. The most impactful next steps are building referrer domain classification to break apart the 'Organic/Direct' bucket, adding an attribution coverage and health dashboard to detect tracking degradation, and implementing a last-touch attribution model to enable model comparison \u2014 all three are high-priority, medium-effort actions that could be completed in 2-3 focused sprints. After that foundation is in place, multi-touch models, survey integration, and session detection would bring the project to full attribution maturity over an additional 2-3 sprints. The raw data and intermediate models needed for these enhancements largely already exist (int_user_id__attribution_history, stg_posthog__person_identity's referrer fields, stg_backend__kilocode_users' customer_source), making the path forward one of building new models on existing data rather than instrumenting new data collection."
}
}