Jun 2024–Feb 2025

Discovery → MVP (pilot)

GenAI use case acceleration platform

I led the design of a governed GenAI platform that let teams validate use cases quickly without adding compliance risk. Business Users had strong ideas but no sanctioned way to test them. Citizen Developers could prototype, but results were hard to reproduce or compare. Data Scientists received inconsistent inputs and thin evidence for review. The MVP standardized the workflow through two role-based modes, reusable templates, enterprise guardrails (policy guidance, role-based access, audit logging, safety checks), and side-by-side model comparison. The pilot (Jan–Feb 2025) produced reviewable evidence and advanced several use cases into deeper Data Science evaluation, with no compliance incidents during the pilot.

I led the design of a governed GenAI platform that let teams validate use cases quickly without adding compliance risk. Business Users had strong ideas but no sanctioned way to test them. Citizen Developers could prototype, but results were hard to reproduce or compare. Data Scientists received inconsistent inputs and thin evidence for review. The MVP standardized the workflow through two role-based modes, reusable templates, enterprise guardrails (policy guidance, role-based access, audit logging, safety checks), and side-by-side model comparison. The pilot (Jan–Feb 2025) produced reviewable evidence and advanced several use cases into deeper Data Science evaluation, with no compliance incidents during the pilot.

Role and scope

Senior Product Designer, embedded in the client's product team (via EPAM)

End-to-end ownership

Team

Product Manager

3 Full-Stack Engineers

2 AI/ML Engineers

QA Engineer

Impact

Onboarded invited users across multiple business units to run hands-on experiments

Use cases submitted with side-by-side comparison evidence for DS review

Reusable templates reduced duplicate experimentation

No compliance incidents during the pilot

Role and scope

Senior Product Designer, embedded in the client's product team (via EPAM)

End-to-end ownership

Team

Product Manager

3 Full-Stack Engineers

2 AI/ML Engineers

QA Engineer

Impact

Onboarded invited users across multiple business units to run hands-on experiments

Use cases submitted with side-by-side comparison evidence for DS review

Reusable templates reduced duplicate experimentation

Clear ownership for GenAI use cases

Role and scope

Senior Product Designer, embedded in the client's product team (via EPAM)

End-to-end ownership

Team

Product Manager

3 Full-Stack Engineers

2 AI/ML Engineers

QA Engineer

Impact

Onboarded invited users across multiple business units to run hands-on experiments

Use cases submitted with side-by-side comparison evidence for DS review

Reusable templates reduced duplicate experimentation

No compliance incidents during the pilot

Problem context

GenAI advanced fast, but the organization lacked a governed way to validate use cases

By 2024, leadership wanted to apply GenAI to productivity and process automation, but experimentation was spread across disconnected tools and informal processes. Business teams generated many ideas, yet validation depended on scarce specialist bandwidth, creating queues and slowing iteration. Citizen Developers could prototype quickly, but outputs were hard to standardize, reproduce, or compare. As a result, Data Scientists received uneven inputs and limited evidence, which slowed decisions on what should advance to production evaluation.

Key roles in the process

Business Users (BU) create requests or run guided templates to validate value with minimal setup.

Citizen Developers (CD) build prototypes, compare models, save evidence versions, and submit candidate packages for review.

Data Scientists (DS) review candidates with supporting evidence and lead production evaluation.

The organization needed a governed workflow that raised learning velocity without raising compliance risk, worked across skill levels, and produced decision-ready artifacts downstream teams could trust.

Faster validation

Validate ideas faster by replacing ticket-driven waits with hours-to-days iteration.

Faster validation

Validate ideas faster by replacing ticket-driven waits with hours-to-days iteration.

Reduced duplication

Reuse templates and shared experiments instead of starting from scratch.

Reduced duplication

Reuse templates and shared experiments instead of starting from scratch.

Governed testing

Keep testing inside guardrails and audit logs to meet compliance requirements.

Governed testing

Test within guardrails and audit logs to meet compliance.

Governed testing

Keep testing inside guardrails and audit logs to meet compliance requirements.

Success criteria

Raise learning velocity without compromising governance

Success meant enabling more teams to validate GenAI use cases quickly while keeping work compliant, auditable, and reproducible enough for downstream Data Science review.

Success signals

Adoption, expanding participation beyond specialist teams with minimal onboarding.

Speed, reducing idea-to-prototype time versus ticket-driven workflows.

Evidence quality, producing reviewable, reproducible candidate versions suitable for DS intake.

Reuse, increasing reuse via templates and shared patterns to reduce duplicate work.

Governance, maintaining access controls and auditability throughout the pilot.

How we measured it

Activation

Number of invited users activated within 24 hours of access.

Activation

Number of invited users activated within 24 hours of access.

Speed

Median time from login to a useful, shareable output.

Speed

Median time from login to a useful, shareable output.

Speed

Median time from login to a useful, shareable output.

Reuse

Share of template-first sessions vs blank starts.

Reuse

Share of template-first sessions vs blank starts.

Reuse

Share of template-first sessions vs blank starts.

Evidence quality

Share of submitted candidates accepted for deeper evaluation

Evidence quality

Share of submitted use cases that passed review.

Evidence quality

Share of submitted candidates accepted for deeper evaluation

Key decisions

One platform, role-based surfaces, shared lifecycle

Role-based modes built on shared primitives

I split the experience by user capability rather than forcing one interface to serve everyone. In the Library, Business Users create requests, often from a template, using safe defaults and plain-language guidance. In the Workspace, Citizen Developers use structured tools to iterate, compare models, and capture evidence as saved versions. To avoid shipping two products, I kept both surfaces on one lifecycle: Requests and templates → Prototypes → Saved versions (candidate selected) → Review. Progressive disclosure let users move between surfaces without relearning the system.

Templates as the unit of reuse

I treated templates as the core reusable asset, capturing intent, constraints, variables, examples, and usage guidance in a standard format. Business Users met templates through plain-language descriptions and filed requests when new capability was needed. Citizen Developers authored and iterated templates in a lightweight structured editor. To cut authoring overhead, we provided strong defaults, a fast request-to-template intake flow, and a publish-and-vet step that held quality and governance without slowing iteration.

Model comparison with cost and latency signals

I designed model comparison so Citizen Developers could run the same sanitized input and prompt version across multiple models and compare outputs side by side, with latency, token usage, and estimated cost per run. This supported real decisions about quality, speed, and spend while reducing vendor lock-in. To keep the experience manageable, we shipped a curated model catalog, default presets, and role-based access for higher-cost models.

Turn prototype results into decision-ready artifacts

I designed each prototype run to capture as structured evidence: sanitized inputs, the originating request and template baseline, the saved prompt version, model and preset, outputs, timestamps, and policy checks, with optional notes for context. This replaced screenshots and ad hoc docs with a consistent package downstream teams could reproduce and review. When a run looked strong, Citizen Developers marked a saved version as the candidate and used Submit for review to package it for DS intake, with exports gated by policy.

Repeatable runs

Saved versions store prompt settings and sanitized inputs, so any run can be reproduced.

Fast to review

Candidate versions package outputs and key metrics for fast DS review.

Clean handoff

Save candidate, then submit for review to create a DS-ready package with no manual prep.

Clean handoff

Test within guardrails and audit logs to meet compliance.

Outcomes

Faster validation, reusable assets and a review-ready funnel with no incidents during the pilot

The MVP pilot (Jan–Feb 2025) validated that a governed workflow can scale use-case validation without slowing teams down.

Key outcomes

Onboarded invited users across multiple business units, expanding experimentation beyond specialist teams.

Shifted early validation from a multi-day specialist queue to same-session iteration, with first useful output in minutes rather than days.

Submitted Data Science–ready candidates with side-by-side comparisons and reproducible versions; the review gate accepted the strongest and returned others with specific feedback.

Published a library of reusable templates, and most sessions started template-first rather than from a blank prompt, reducing duplicate work.

No compliance incidents during the pilot, which supported stakeholder confidence and approval to continue.

Retrospective

A strong foundation for governed learning, with clear next steps for onboarding, evaluation, and data access

The biggest win was treating this as a platform and a workflow, not just a UI. Role-based surfaces and templates turned ad hoc experimentation into reusable assets, and saved versions made prototype results reproducible and reviewable downstream. The main gaps were onboarding and evaluation clarity: some users struggled to write a strong first request, and many wanted clearer signals to judge output reliability and failure modes. The pilot also surfaced a predictable constraint — data-heavy use cases need a defined escalation path from the sandbox to curated, policy-approved datasets, with explicit gates and ownership, so teams can test responsibly without slowing adoption.

Onboarding

Reduce first-run friction with recommended starters, safe examples, and scaffolds.

Onboarding

Guided first-run flows with safe examples, and inline coaching to cut time-to-first-value.

Onboarding

Reduce first-run friction with recommended starters, safe examples, and scaffolds.

Evaluation

Add a lightweight rubric and clearer uncertainty cues so teams can judge quality quickly.

Evaluation

Add clearer uncertainty signals so reviewers can scan quality and risk consistently.

Evaluation

Add a lightweight rubric and clearer uncertainty cues so teams can judge quality quickly.

Data access

Define a gated path to curated, policy-approved datasets for data-heavy use cases.

Data access

A staged path to curated, policy-approved datasets with clear constraints and ownership.

Data access

Define a gated path to curated, policy-approved datasets for data-heavy use cases.