GUNABHIRAM · BILLA · DATA · ENGINEER · NYC · 2026
WITH impact AS (
  SELECT project_id,
    'production' AS status,
    measurable_value
  FROM portfolio.systems
  WHERE built_with_care = 'true'
)
SELECT * FROM impact;
NYC · LIVE · --:--:-- EST ● Available

The pipeline
ran at 3 a.m.
nobody noticed. — that's the goal.

$ I'm Gunabhiram Billa — a Data & Analytics Engineer in New York. I build the unglamorous infrastructure underneath your dashboards: idempotent pipelines, multi-tenant warehouses, and reconciliation layers that catch mismatches before finance does. 50+ production systems shipped. 60+ multi-tenant dashboards live. No demos that pass once and break Tuesday.

0
Production Projects
0
Customer Dashboards
0
Reporting Time Saved
0
Technical Articles
Gunabhiram Billa
// Engineer_Profile.jpg
SNOWFLAKE · PYTHON · SQL · ANALYTICS
Snowflake Python Automation Dimensional Modeling Healthcare Analytics Idempotent Pipelines KPI Frameworks Reconciliation Systems Operational Clarity Multi-Tenant Architecture Snowflake Python Automation Dimensional Modeling Healthcare Analytics Idempotent Pipelines KPI Frameworks Reconciliation Systems Operational Clarity Multi-Tenant Architecture
// 01 · who

A builder of quiet
infrastructure.

RoleData & Analytics Engineer
BasedNew York City
StatusOpen to hybrid / remote
StackSnowflake · Python · SQL
DomainHealthcare operational analytics
Shipped50+ projects · 60+ dashboards
DirectionSenior data system ownership
Currently Designing layered Snowflake architectures for multi-tenant healthcare analytics — and writing about the patterns along the way.

I work in the unglamorous middle layer — between the source systems that emit data and the dashboards that consume it. Semantic layers, KPI frameworks, idempotent pipelines. The kind of code finance, sales, and CS teams quietly depend on without ever opening a ticket about.

Across 50+ production projects, I've designed multi-tenant warehouses serving 60+ customer-facing dashboards, automated reporting that turned two-day Mondays into 8 a.m. emails, and reconciliation layers that catch mismatches before finance closes the books.

I care about idempotency, dimensional discipline, and cross-system reconciliation — the foundations that make the dashboards on top actually trustworthy. If a pipeline ran at 3 a.m. and nobody noticed, that's the goal.

// how I work
  • I write more SQL comments than anyone I've worked with.
  • I document like the next engineer is a stranger — because eventually they will be.
  • I'd rather ship a smaller correct thing than a bigger fragile one.
  • I treat reconciliation like a security control, not a nice-to-have.
// 02 · stack

What I reach for.

A working toolkit, sharpened across data warehousing, automation, and cross-system integration in healthcare operational analytics. No boilerplate languages I've touched once on a tutorial. Tap a category to explore.

// 01

SQL Mastery

  • Complex joins & CTE pipelines
  • Window functions & analytics
  • Query plan analysis
  • Performance & scan reduction
  • Procedural SQL & scripting
// 02

Python Automation

  • Snowflake connector workflows
  • Pandas & openpyxl reporting
  • Word/PDF document generation
  • API ingestion & pagination
  • Scheduled job orchestration
// 03

Pipeline Engineering

  • Idempotent incremental loads
  • Watermark & control tables
  • Merge upsert patterns
  • Late-arriving data handling
  • Recovery & retry logic
// 01

Snowflake

  • Warehouse design & sizing
  • Role/grant/secure-share patterns
  • Query optimization & clustering
  • Credit metering & cost tuning
  • Streams, tasks, & materialized views
// 02

BI & Visualization

  • Tableau dashboards
  • Amazon QuickSight (embedded)
  • Excel automation & charting
  • Multi-tenant dashboard frameworks
  • Semantic layer design
// 03

Integrations & Cloud

  • AWS — S3, API logs, IAM
  • Zoho CRM / Desk integrations
  • QuickBooks reconciliation
  • REST API ingestion
  • Git / GitHub workflows
// 01

Dimensional Modeling

  • Star & snowflake schemas
  • SCD Type 1 & Type 2
  • Conformed dimensions
  • Bridge & junk tables
  • Grain definition discipline
// 02

Multi-Tenant Architecture

  • Tenant-aware fact tables
  • Row-level isolation patterns
  • Cross-tenant benchmarking
  • Shared dimension governance
  • Customer-specific overrides
// 03

Layered Transformations

  • Staging → intermediate → mart
  • Snapshot vs CDC vs full-load
  • Behavioral fact patterns
  • Time-aware joins
  • Semantic exposure layers
// 01

Operational Analytics

  • Funnel & drop-off analysis
  • Provider productivity metrics
  • Visit lifecycle tracking
  • Engagement & adoption KPIs
  • Customer health scoring
// 02

Financial Reconciliation

  • CRM ↔ billing alignment
  • FTE thresholds & classification
  • Daily reconciliation snapshots
  • Exception reporting views
  • Audit trail design
// 03

Cross-Functional Delivery

  • Stakeholder requirement capture
  • ROI framework design
  • Renewal & upsell analytics
  • Executive reporting cadence
  • Documentation & handoff
// 03 · work · 10 entries

Case files.

Real systems I've shipped — the architecture that held, the bugs I traced through three layers, the trade-offs that mattered. Selected from a portfolio of 50+ production projects. Click any row to expand.

/01
Automated KPI Reporting Pipeline
SnowflakePythonSQLWord/PDF
Fully automated extraction → templated Excel charts → Word reports → PDF distribution. ~80% reduction in manual reporting time.
+
The Challenge

Customer Success was spending two full days every week assembling a recurring KPI report — pulling SQL exports, copying values into Excel templates, refreshing charts manually, then re-typing summaries into a Word doc and exporting to PDF. The same humans were doing the same clicks every Monday.

The Approach

Designed a parameterized Python pipeline that reads customer config from a control table, runs templated Snowflake queries, populates an Excel workbook (preserving formatting and pivot logic), and injects the results plus auto-generated narrative into a Word template via python-docx. Final step exports to PDF for leadership distribution.

Technical Details
  • Parameterized SQL with config-driven date windows and tenant filters
  • openpyxl for chart-preserving Excel population
  • python-docx with bookmarked placeholder injection
  • Reusable across 30+ customers with no code changes
  • Scheduled via cron with email distribution on success
~80%
Time saved
30+
Customers
Weekly
Cadence
/02
Customer ROI Metrics Engine
SnowflakeSQLExcel
Standardized ROI framework modeling intake, scheduling, reminders, and engagement. Backed renewal & upsell with measurable product value.
+
The Challenge

The Sales and Customer Success teams couldn't quantify product value at renewal conversations. Each rep was inventing their own ROI math, leading to inconsistent numbers across customers — and no defensible framework when leadership asked "what's the upsell story?"

The Approach

Built a unified Snowflake-backed ROI model that quantifies four pillars: intake completion lift, scheduling adoption, reminder effectiveness (no-show reduction), and engagement uplift. Each pillar maps to specific time and revenue assumptions configurable per customer segment.

Technical Details
  • Per-customer baseline measurement (pre-product window)
  • Configurable assumption layer for time/cost values
  • Versioned metric definitions in a governance table
  • Excel deliverable with executive summary and detail tabs
  • Used by Sales, CS, and Product strategy teams
4
ROI pillars
100%
Renewal coverage
All
Customers
/03
Multi-Tenant Data Warehouse Architecture
SnowflakeSQLSCD II
Fact/dimension models, time-aware joins, layered staging→intermediate→mart. Improved query performance and tenant-aware governance.
+
The Challenge

Multiple customers, shared infrastructure, divergent needs — and a warehouse that had grown organically into a tangle of one-off views. Cross-tenant benchmarking was unsafe, performance was inconsistent, and onboarding a new customer required custom code rather than configuration.

The Approach

Redesigned as a layered architecture: staging mirrors source systems, intermediate handles business logic and conformed dimensions, marts expose tenant-isolated semantic models. SCD Type 2 captures historical attribution. Tenant ID propagates through every fact for safe row-level isolation.

Technical Details
  • Three-layer architecture with clear contracts between layers
  • SCD Type 2 with effective_from / effective_to / is_current
  • Conformed dimensions shared across tenants
  • Row-access policies for tenant isolation
  • Foundation for 60+ multi-tenant dashboards
60+
Dashboards on top
3
Layers
Type 2
Historical SCD
/04
Idempotent Incremental ETL Framework
SnowflakeSQLPython
Watermark tracking, merge upserts, late-arriving data validation, control tables for recovery. Daily pipelines hardened for safe reruns.
+
The Challenge

Daily pipelines occasionally failed mid-run or produced duplicates after partial recovery. Engineers had to manually clean state before reruns, and late-arriving source records silently corrupted aggregates. Trust in the data was eroding.

The Approach

Designed an incremental framework around three principles: deterministic watermark tracking, merge-based upserts (never plain inserts), and explicit validation gates. Every run logs to a control table; every failure can be safely retried without manual cleanup.

Technical Details
  • Watermark column tracked per source-target pair
  • MERGE statements with deterministic match keys
  • Validation queries for duplicates and gaps
  • Late-arriving record detection & backfill logic
  • Control tables track state for surgical recovery
100%
Idempotent
Daily
Cadence
~min
Recovery time
/05
Cross-System Data Quality & Reconciliation
SnowflakeSQL
Automated mismatch detection across CRM, billing, and ops systems with daily reconciliation snapshots. Improved trust in financial reporting.
+
The Challenge

CRM said one number of active contracts. Billing said another. The operational system said a third. Finance was reconciling these by hand each month-end, and discrepancies often weren't surfaced until they had compounded across periods.

The Approach

Built a reconciliation layer that joins all three systems on canonical entity keys and applies deterministic mismatch rules. Every mismatch is logged with a category, severity, and recommended action. Daily snapshots track drift over time, with exception reports surfaced to operations.

Technical Details
  • Canonical entity resolution across three systems
  • Deterministic mismatch classification rules
  • Daily snapshot fact for drift monitoring
  • Severity scoring and routing logic
  • Structured exception views for operations
3
Systems
Daily
Snapshots
Finance trust
/06
Snowflake Cost & Query Optimization
SnowflakeSQL
Workload segmentation (ETL / BI / ad hoc), right-sizing, auto-suspend policies, full-scan reduction. Long-term compute cost predictability.
+
The Challenge

Snowflake credit consumption was growing faster than usage justified. A single oversized warehouse was running every workload, ad-hoc queries were doing full table scans, and no one had visibility into which queries were the actual cost drivers.

The Approach

Mined query history and warehouse metering to attribute cost by workload. Split into purpose-sized warehouses (small for BI, medium for ETL, X-small for ad hoc), enforced auto-suspend, and rewrote the top expensive queries to use clustering and pruning. Built an internal cost dashboard for ongoing visibility.

Technical Details
  • Query history mining & cost attribution
  • Workload segmentation across multiple warehouses
  • Auto-suspend tuning per workload pattern
  • Clustering keys and pruning for top queries
  • Ongoing cost monitoring dashboard
Credit burn
Predictability
3
Workload tiers
/07
UrgentIQ Visit Status & Event Analytics
PythonSQLSnowflakeAPI
Event-driven pipeline, status timeline normalization, transition-duration analysis, configurable schema mapping. Eliminated manual lifecycle analysis.
+
The Challenge

Visit lifecycle data lived in a noisy event stream from the UrgentIQ API. Some statuses were skipped, some were emitted out of order, and timestamps were UTC while reporting expected EST. Operations couldn't answer basic questions like "how long are visits stuck in checked-in status?"

The Approach

Built an incremental ingestion pipeline that normalizes status timelines (1 → N), gap-fills missing transitions with deterministic rules, standardizes timezone, and computes transition durations. Output is a clean lifecycle fact ready for funnel and bottleneck analysis.

Technical Details
  • API pagination & incremental ingestion
  • Status timeline gap-filling logic
  • UTC → EST timestamp normalization
  • Transition duration computation per status pair
  • Deduplication & configurable schema mapping
0
Manual analysis
Visibility
Reusable
Framework
/08
Provider Count & Billing Reconciliation
SQLExcelQuickBooks
Aligned operational provider counts with CRM contracts and billing records via FTE-threshold logic. Reduced underbilling risk.
+
The Challenge

Customers were billed based on provider counts, but "provider" meant different things in different systems. Some part-time providers were counted as full FTEs, some full-time providers were missed entirely, and contract terms specified thresholds nobody was actively checking.

The Approach

Designed a reconciliation workflow that joins operational provider activity with CRM contract terms and QuickBooks billing records. Applied FTE thresholds (active hours, encounter volume) to classify each provider into a billing category. Standardized monthly reconciliation checkpoints.

Technical Details
  • Three-system join: ops, CRM, billing
  • FTE threshold logic with multiple inputs
  • Provider classification into billing tiers
  • Monthly reconciliation checkpoint cadence
  • Exception reports for review by finance
Underbill risk
Monthly
Cadence
Accuracy
/09
SMS Optimization & Number Rationalization
SQLExcel
Standardized templates at ~150-char threshold, disabled 300+ inactive numbers. Lower monthly SMS spend, improved consistency.
+
The Challenge

SMS costs were inflated by two avoidable issues: messages tipping over the 160-character single-segment threshold (doubling per-message cost) and a fleet of 300+ phone numbers that were billed monthly but no longer in use. Nobody had audited either in years.

The Approach

Audited message content across templates, identified those tipping over the segment boundary, and rewrote them under a 150-character target with a margin for personalization variables. Cross-referenced active phone numbers against actual outbound traffic and decommissioned the inactive ones.

Technical Details
  • Segment count analysis across all templates
  • ~150-character optimization threshold
  • Number-to-traffic cross reference
  • 300+ inactive numbers identified & disabled
  • Standardized template review process
300+
Numbers cut
Monthly cost
~150
Char target
/10
EMR API Usage & Cost Analysis
SQLAPI Logs
Six-month analysis surfacing high-volume endpoints and redundant call patterns. Supported engineering API cost governance.
+
The Challenge

EMR vendor API costs were rising without a clear understanding of what was driving usage. Were certain endpoints inefficient? Were features double-fetching data? Engineering needed evidence before making architectural changes.

The Approach

Mined six months of API logs to surface usage patterns: top endpoints by volume, redundant call chains, customer-level cost attribution, and timing correlations with downstream features. Output was a structured report engineering used to prioritize caching and batching work.

Technical Details
  • Six months of log analysis
  • Endpoint volume and cost ranking
  • Redundant call chain detection
  • Customer-level cost attribution
  • Optimization recommendations to engineering
6mo
Log window
Wasted calls
Governance
/11
dbt-Based Analytics Engineering Framework
dbtSnowflakeJinjaYAML
Migrated legacy SQL views to dbt with tests, docs, and CI/CD. Replaced tribal knowledge with verifiable contracts.
+
The Challenge

Hundreds of business-critical views had grown over years with no tests, no documentation, and no consistent style. Logic was duplicated across views, dependencies were implicit, and a change in one place could quietly break a dashboard halfway across the warehouse.

The Approach

Migrated transformations to dbt with a layered staging → intermediate → mart structure. Added schema tests for uniqueness, referential integrity, and accepted values. Generated docs auto-publish on every merge. CI runs the full test suite before any change reaches production.

Technical Details
  • Layered model architecture with explicit DAG
  • Schema tests, custom tests, and freshness checks
  • Macros for reused logic (tenant filters, SCD merges)
  • Auto-generated docs with column-level lineage
  • GitHub Actions CI for slim builds on PRs
200+
Models migrated
100%
Tested coverage
Auto
Docs & lineage
/12
Near-Real-Time Event Ingestion (Kafka → Snowflake)
KafkaSnowpipePythonStreams
Sub-minute event ingestion with quality monitoring and out-of-order handling. Powered live operational dashboards with no batch lag.
+
The Challenge

Operational teams needed visibility into events as they happened — not 24 hours later. The existing batch pipeline ran nightly, which meant by the time an issue showed up on the dashboard, the day was over. Switching to streaming meant rethinking watermarks, ordering, and failure recovery.

The Approach

Built a Kafka-to-Snowflake pipeline using Snowpipe Streaming for sub-minute landing, then Snowflake Streams + Tasks for incremental transformation. Out-of-order events handled via watermark tolerance windows. Quality metrics (lag, dropped events, schema drift) emitted to a monitoring dashboard.

Technical Details
  • Snowpipe Streaming with row-level latency under 60s
  • Streams + Tasks for incremental aggregation
  • Watermark tolerance for out-of-order events
  • Schema evolution handling without pipeline restart
  • Lag, drop, and drift metrics surfaced to ops
<1m
End-to-end lag
24/7
Quality monitor
0
Manual recovery
/13
RAG Data Quality Framework for Clinical Knowledge Retrieval
PythonVector DBSnowflakeEmbeddings
Source-data validation layer that catches stale, ambiguous, or contradictory content before it reaches an LLM. Reduced hallucination risk in retrieval-augmented systems.
+
The Challenge

RAG systems are only as trustworthy as the documents they retrieve from. Stale clinical guidelines, contradictory sources, and ambiguous procedure documentation all silently corrupt LLM answers downstream — with no easy way to detect the failure after the fact.

The Approach

Designed a quality framework that runs on the source corpus before embedding: freshness checks (last-updated thresholds), deduplication (semantic similarity clustering), contradiction detection (cross-document NLI scoring), and provenance tagging. Documents that fail are quarantined; the rest enter the vector index with quality metadata attached.

Technical Details
  • Document freshness scoring with configurable thresholds
  • Semantic dedup via embedding similarity clusters
  • Cross-document contradiction detection
  • Provenance metadata propagated to retrieval layer
  • Quarantine queue for human review
Hallucination risk
Source trust
Auto
Quality gates
// 04 · principles · 5 rules

How I think.

Opinions earned shipping data systems. Each one has a scar behind it — a 3 a.m. page, a quarter-end mismatch, a dashboard that lied for two months before anyone noticed. I write them down so I don't forget.

01 · grain
// design

If you can't state the grain in one sentence, the model isn't ready.

Most analytics bugs aren't typos. They're cardinality surprises hiding in a vague join. Pin the grain first, write SQL second. Twenty minutes of sketching saves four hours of debugging.

02 · idempotency
// pipelines

A pipeline that needs human cleanup to retry is not a pipeline.

It's a script with a person attached. Watermarks, merge upserts, control tables — make rerunning safe by construction. The first time a 3 a.m. failure becomes a 3:02 a.m. retry, you'll know it's working.

03 · isolation
// multi-tenant

There's no "mostly correct" in tenant isolation.

The row policy either holds for every query or it doesn't. Test isolation like a security control — write queries that try to violate it and confirm they can't. The worst-case bug is silent.

04 · reconciliation
// governance

Reconciliation isn't a project. It's a control plane you operate.

Run it daily, review weekly, audit monthly. Drift compounds; cadence catches it early. By month-end, three systems disagreeing has cost real money. Make every mismatch visible within 24 hours.

05 · documentation
// craft

The next engineer is a stranger.

Write SQL, comments, and docs like the person debugging this at 3 a.m. has zero context. Eventually they will. That stranger might be future you. The empathy is the same.

// 05 · writing · 8 essays

Field notes.

Technical writing on data modeling, architecture decisions, automation patterns, and governance. Most of these are post-mortems on something that broke first.

// 06 · contact · door's open

Still here?
Let's build something durable.