Jun 2024–Feb 2025
Discovery → MVP (pilot)
GenAI use case acceleration platform

Problem context
GenAI advanced fast, but the organization lacked a governed way to validate use cases
By 2024, leadership wanted to apply GenAI to productivity and process automation, but experimentation was spread across disconnected tools and informal processes. Business teams generated many ideas, yet validation depended on scarce specialist bandwidth, creating queues and slowing iteration. Citizen Developers could prototype quickly, but outputs were hard to standardize, reproduce, or compare. As a result, Data Scientists received uneven inputs and limited evidence, which slowed decisions on what should advance to production evaluation.
Key roles in the process
Business Users (BU) create requests or run guided templates to validate value with minimal setup.
Citizen Developers (CD) build prototypes, compare models, save evidence versions, and submit candidate packages for review.
Data Scientists (DS) review candidates with supporting evidence and lead production evaluation.
The organization needed a governed workflow that raised learning velocity without raising compliance risk, worked across skill levels, and produced decision-ready artifacts downstream teams could trust.

Success criteria
Raise learning velocity without compromising governance
Success meant enabling more teams to validate GenAI use cases quickly while keeping work compliant, auditable, and reproducible enough for downstream Data Science review.
Success signals
Adoption, expanding participation beyond specialist teams with minimal onboarding.
Speed, reducing idea-to-prototype time versus ticket-driven workflows.
Evidence quality, producing reviewable, reproducible candidate versions suitable for DS intake.
Reuse, increasing reuse via templates and shared patterns to reduce duplicate work.
Governance, maintaining access controls and auditability throughout the pilot.
How we measured it
Key decisions
One platform, role-based surfaces, shared lifecycle
Role-based modes built on shared primitives
I split the experience by user capability rather than forcing one interface to serve everyone. In the Library, Business Users create requests, often from a template, using safe defaults and plain-language guidance. In the Workspace, Citizen Developers use structured tools to iterate, compare models, and capture evidence as saved versions. To avoid shipping two products, I kept both surfaces on one lifecycle: Requests and templates → Prototypes → Saved versions (candidate selected) → Review. Progressive disclosure let users move between surfaces without relearning the system.

Templates as the unit of reuse
I treated templates as the core reusable asset, capturing intent, constraints, variables, examples, and usage guidance in a standard format. Business Users met templates through plain-language descriptions and filed requests when new capability was needed. Citizen Developers authored and iterated templates in a lightweight structured editor. To cut authoring overhead, we provided strong defaults, a fast request-to-template intake flow, and a publish-and-vet step that held quality and governance without slowing iteration.

Model comparison with cost and latency signals
I designed model comparison so Citizen Developers could run the same sanitized input and prompt version across multiple models and compare outputs side by side, with latency, token usage, and estimated cost per run. This supported real decisions about quality, speed, and spend while reducing vendor lock-in. To keep the experience manageable, we shipped a curated model catalog, default presets, and role-based access for higher-cost models.

Turn prototype results into decision-ready artifacts
I designed each prototype run to capture as structured evidence: sanitized inputs, the originating request and template baseline, the saved prompt version, model and preset, outputs, timestamps, and policy checks, with optional notes for context. This replaced screenshots and ad hoc docs with a consistent package downstream teams could reproduce and review. When a run looked strong, Citizen Developers marked a saved version as the candidate and used Submit for review to package it for DS intake, with exports gated by policy.
Repeatable runs
Saved versions store prompt settings and sanitized inputs, so any run can be reproduced.
Fast to review
Candidate versions package outputs and key metrics for fast DS review.
Outcomes
Faster validation, reusable assets and a review-ready funnel with no incidents during the pilot
The MVP pilot (Jan–Feb 2025) validated that a governed workflow can scale use-case validation without slowing teams down.
Key outcomes
Onboarded invited users across multiple business units, expanding experimentation beyond specialist teams.
Shifted early validation from a multi-day specialist queue to same-session iteration, with first useful output in minutes rather than days.
Submitted Data Science–ready candidates with side-by-side comparisons and reproducible versions; the review gate accepted the strongest and returned others with specific feedback.
Published a library of reusable templates, and most sessions started template-first rather than from a blank prompt, reducing duplicate work.
No compliance incidents during the pilot, which supported stakeholder confidence and approval to continue.
Retrospective
A strong foundation for governed learning, with clear next steps for onboarding, evaluation, and data access
The biggest win was treating this as a platform and a workflow, not just a UI. Role-based surfaces and templates turned ad hoc experimentation into reusable assets, and saved versions made prototype results reproducible and reviewable downstream. The main gaps were onboarding and evaluation clarity: some users struggled to write a strong first request, and many wanted clearer signals to judge output reliability and failure modes. The pilot also surfaced a predictable constraint — data-heavy use cases need a defined escalation path from the sandbox to curated, policy-approved datasets, with explicit gates and ownership, so teams can test responsibly without slowing adoption.

