Jeremy Siwik Portfolio

Jeremy Siwik

Agents ↔️ Interfaces

Choreographing a bespoke network of models to support Labs' interface

brief

My Contribution
Translated a prototyped no-code and node-based architecture into a custom, internal framework which supported a novel interface for interacting with generative AI.

Outcome
From the first successful generations to the current framework, took a rather unreliable flow lasting 5-6 minutes, down to about 5 seconds with few hiccups.

Challenges went from successfully generating outputs, to speeding up the flow, to improving the quality, to becoming increasingly interrogable.

Role
Developer

Duration
4 months

Challenge

Our software's input requirements were strict and unique, so the first obstacle was getting models to produce working data digestible by our stack (~2000 line JSON objects).

From there, improve performance and layer on features to support UX improvements.

Goal

Choreograph agents to enable a newly responsive AI surface with multiple entry points, where users can prompt a multitude of locations within a cohesive interface for learning.

Tangled with:

Delegation

Applying varyingly complex operations to differently weighted models for speed, quality, and cost.

Speed vs Quality

Fine-tuning this balance, providing options for depth, and designing for users to extrapolate where needed.

Resilience

Fallbacks, edge cases, and UI feedback when generations failed.

Easier Prompting

Easing requirements of users through preset system prompts and model choices behind different features.

Memory and Context

Independent model calls would better respond if they had context of the initial content, and enable quicker UX flows.

Generated vs Sourced

Educational in nature, how to balance real vs responsive content with respect to answered questions.

Designed for educational explainers, the orchestration needed to combine sourced and generated information while drawing on a consistent body of context for follow up questions.

First flow:

1: No-Code Node Tool

First success! But 14 (😅) steps in a slow and unintegrated external no-code tool.

Emulated the models after our own writing and design process.

~4-6 minutes

start

enrich

research

deeper dive

generate outline

extract image terms

search images

generate text

generate layout

generate metadata object

array length evaluator

generate page

compile

result

start

2: Connected

Frontend calling the external tool to return JSON from prompts.

Next up: create a proper interface, and find more ways to interact within it.

4-6 minutes

prompt

story generation

story

first responsive element

prompt

3: Preliminary UI

Working UI and long wait times. Bogey number one was gen time.

~4-6 minutes

prompt

story generation

story

adding responsive elements

prompt

interlude -- Started building everything internally here.

Closer control provided more ways for models to coordinate and closer parity with the interface.

4: No more no-code

Cut out the middleman and called models directly and sequentially via their respective SDKs.

Toyed with steps, prompts, and models but still in simple, linear structures. ~70% success rate.

~2-3 minutes

prompt

outline

media

script

fact check

generate pages

story

responsive elements

prompt

5: Under a minute (!!)

Compartmentalized tasks to simplify the initial steps, and then ran the heavy ones in parallel.

Strategizing around compromises for speed, quality, or reliability. Beginning to link calls.

~1 minute

prompt

outline

script

parallel generation

story

responsive elements

prompt

6: Reliable (woo finally)

Enabled the research step to progressively trigger slide creation then addition, dramatically reducing time from prompt to story.

Using new tooling approaches helped the flow hit a ~90% success rate.

~5 seconds

prompt

streaming research

page generation

story

responsive elements

prompt

7: Memory and Thinking

Established a shared contextual memory strategy across interactions and model calls.

As well brushed up against a quality vs time barrier around the 5-second mark, so added a richer, optional thinking mode.

~5 or 30 seconds

prompt

streaming research

page generation

story

responsive elements

prompt

reflections and outcome