Google Summer of Code 2025 Final Report: Tölvera LLM Engine

by MClem (me 🙂)

Project: Enhancing Creative Workflows with a Natural Language Interface for Tölvera

Organization: Tölvera

Mentors: Jack, Victor, and Piotr

Final Pull Request: https://github.com/afhverjuekki/tolvera/pull/56

Final GSoC Overview Video

A summary of the entire 12-week development journey, from initial architecture decisions to the final working Natural Language Interface.

1. Original Project Goals

This project initially aimed to refine and significantly extend a functional proof-of-concept Natural Language Interface (NLI) for Tölvera, making it accessible to artists and researchers regardless of their coding expertise. The original vision was to create an interactive system that could translate natural language commands into Tölvera sketch generation and modification, acting as a collaborative partner for users exploring artificial life and generative art.

The core objectives included:

  • Creating a tv.llm module from a proof-of-concept into a production-ready feature for future alife researchers and
  • Leveraging Large Language Models (LLMs) to bridge the gap between natural language and executable Taichi GPU kernels
  • Creating an intuitive interface that minimizes technical barriers while maintaining full transparency into the generation process
  • Ensuring user privacy through local model support via Ollama alongside cloud providers

2. What I Accomplished (What actually happened)

Architectural Evolution

The project underwent a significant architectural transformation from the initial proof-of-concept through 12 weeks of iterative development:

Week 1: Started with a pragmatic re-evaluation comparing Pydantic against TypeChat for schema validation, ultimately choosing Pydantic’s post-hoc validation approach for fine-grained control.

Week 2: Began prototyping a Textual-based TUI to provide an interactive interface for the synthesis system.

Week 3-4: Experimented with a Mixture of Experts (MoE) architecture using specialized agents (ConductorAgent, ParticleCreationAgent, ColorAgent, etc.), but found it too rigid for dynamic behaviors.

Week 5: Pivoted to the Product of Programmatic Experts (PoE) system - a force-based approach where small expert functions are synthesized and composed dynamically. This solved the Taichi compilation issues and allowed for composability.

Week 6: Added inter-particle interactions, automated error correction, and a kernel accumulator for preserving generated code. The system could now handle behaviors like “particles repel each other.

Week 7: Implemented behavior decomposition for complex descriptions, intelligent species management, and boundary behaviors. This allowed handling descriptions like “fish school together and avoid predators.”

Week 8: Built dynamic state generation system that automatically creates required states from behavior descriptions, paired with Jinja2 templates for structuring the final sketch assembly.

Week 9: Expanded beyond force-based behaviors to support alife patterns including cellular automata, slime molds, and multi-phase systems. Created new expert types (visual/drawing, utility, interaction) for state transitions.

Week 10: Shifted to direct synthesis for expert functions using Gemini 2.0 Flash with detailed Tölvera and Taichi-aware prompts, while maintaining Jinja2 templates for final sketch organization. Built full tracing system for debugging.

Week 11: Created the full Textual UI with sketch generation, natural language refinement with diff highlighting, and a tutorial system.

Week 12: Added automatic error recovery through a repair button, conversation memory for maintaining context, and two-step refinement process. Created a demo video showcasing the complete Natural Language Interface.

Final Architecture

flowchart TB
    classDef userNode fill:#e1f5e1,stroke:#4caf50,stroke-width:3px,color:#1b5e20
    classDef orchestratorNode fill:#fff3e0,stroke:#ff9800,stroke-width:2px,color:#e65100
    classDef analysisNode fill:#e3f2fd,stroke:#2196f3,stroke-width:2px,color:#0d47a1
    classDef synthNode fill:#fce4ec,stroke:#e91e63,stroke-width:2px,color:#880e4f
    classDef outputNode fill:#f3e5f5,stroke:#9c27b0,stroke-width:3px,color:#4a148c
    classDef contextNode fill:#fffde7,stroke:#fbc02d,stroke-width:2px,color:#f57f17

    User["User Description<br/>'particles swarm and glow'"]
    UI[Textual UI<br/>tolvera_llm_demo.py]
    BO[BehaviorOrchestrator<br/>Main Controller]

    User --> UI
    UI --> BO

    subgraph Analysis ["Analysis & Decomposition Stage"]
        BA[BehaviorAnalyzer<br/>Decomposes Complex Behaviors]
        DC{Decomposed<br/>Components?}
        Components[Component List<br/>- Force behaviors<br/>- Visual effects<br/>- State updates]
        SimplePath[Single Behavior]

        BA --> DC
        DC -->|Yes| Components
        DC -->|No| SimplePath
    end

    BO --> BA

    subgraph Detection ["Detection & Configuration"]
        SM[StateManager<br/>Detects Required States]
        SPM[SpeciesManager<br/>Detects Species]
        CR[ColorResolver<br/>Maps Colors to RGBA]
        States[Custom States<br/>- Global<br/>- Particle<br/>- Species]
        Species[Species Config<br/>- IDs & Names<br/>- Colors<br/>- Interactions]

        SM --> States
        SPM --> Species
        SPM --> CR
    end

    Components --> SM
    Components --> SPM
    SimplePath --> SM
    SimplePath --> SPM

    subgraph Context ["Intelligent Context Selection"]
        CS[ContextSelector<br/>LLM-Powered Selection]
        BaseCtx[Base Context<br/>Core APIs Always Loaded]
        SuppCtx[Supplementary Context<br/>Dynamically Selected Patterns]
        MergedCtx[Merged Context]

        CS --> BaseCtx
        CS --> SuppCtx
        BaseCtx --> MergedCtx
        SuppCtx --> MergedCtx
    end

    Components --> CS
    SimplePath --> CS

    subgraph Synthesis ["Expert Synthesis Loop"]
        CG[CodeGenerator<br/>Synthesizes Experts]
        ExpertCode[Expert Functions<br/>@ti.func decorated]
        BR[BehaviorRegistry<br/>Stores & Manages Experts]
        CheckMore{More<br/>Components?}
        KernelGen[Generate Kernels]

        CG --> ExpertCode
        ExpertCode --> BR
        BR --> CheckMore
        CheckMore -->|Yes| CG
        CheckMore -->|No| KernelGen
    end

    MergedCtx --> CG
    States --> CG
    Species --> CG

    subgraph Rendering ["Template Rendering"]
        TR[TemplateRenderer<br/>Jinja2 Templates]

        subgraph Kernels ["Kernel Generation<br/><br/>"]
            IntKernel[Integration Kernel]
            DrawKernel[Drawing Kernel]
            UtilKernel[Utility Kernel]
        end

        subgraph DataModels ["Data Model Rendering<br/><br/>"]
            ExpertRender[Expert Functions]
            ForceComp[Force Computation]
            DrawComp[Drawing Computation]
        end

        SketchRender[render_sketch<br/>Final Assembly]

        TR --> Kernels
        TR --> DataModels
        Kernels --> SketchRender
        DataModels --> SketchRender
    end

    KernelGen --> TR
    BR -.-> TR
    States -.-> TR
    Species -.-> TR

    SketchRender ==> FinalSketch

    FinalSketch[["<br/>Generated Sketch<br/>Complete Python/Taichi Code<br/>Ready to Run"]]

    class User userNode
    class BO orchestratorNode
    class BA,SM,SPM,CR analysisNode
    class CS,BaseCtx,SuppCtx,MergedCtx contextNode
    class CG,BR,ExpertCode,CheckMore,KernelGen synthNode
    class TR,IntKernel,DrawKernel,UtilKernel,ExpertRender,ForceComp,DrawComp,SketchRender synthNode
    class FinalSketch outputNode
    class States outputNode
    class Species outputNode

Context Selection Architecture

The system uses an intelligent context selection mechanism that optimizes token usage while maintaining generation quality. This two-tier approach allows core APIs to always be injected into the context window while supplementary contexts are dynamically selected based on specific behavior requirements. Think RAG, but a very simple implementation.

flowchart TB
    classDef inputNode fill:#e1f5e1,stroke:#4caf50,stroke-width:3px,color:#1b5e20
    classDef promptNode fill:#fff3e0,stroke:#ff9800,stroke-width:2px,color:#e65100
    classDef llmNode fill:#e3f2fd,stroke:#2196f3,stroke-width:3px,color:#0d47a1
    classDef contextNode fill:#fffde7,stroke:#fbc02d,stroke-width:2px,color:#f57f17
    classDef loaderNode fill:#f3e5f5,stroke:#9c27b0,stroke-width:2px,color:#4a148c
    classDef outputNode fill:#e8f5e8,stroke:#388e3c,stroke-width:3px,color:#1b5e20

    PL["PromptLoader<br/>build_prompt_with_dynamic_context()"]
    CodeGen["CodeGenerator<br/>Expert Synthesis"]
    MergedPrompt["Complete Merged Prompt<br/>Ready for CodeGenerator"]

    subgraph InputStage ["Input Analysis"]
        UserDesc["User Description<br/>'particles swarm and glow'"]
        ExpertType["Expert Type<br/>force/interaction/visual"]
        AddlContext["Additional Context<br/>species_info, component, etc."]
    end

    UserDesc --> PL
    ExpertType --> PL
    AddlContext --> PL

    subgraph BaseContext ["Base Context (Always Loaded)"]
        BaseAPI["Base Context<br/>library_docs.get_base_context()"]
        CoreAPIs["• Core Tölvera API<br/>• Pixels API<br/>• Taichi fundamentals<br/>• State access patterns"]
    end

    PL --> BaseAPI
    BaseAPI --> CoreAPIs

    subgraph LLMSelection ["LLM-Powered Context Selection"]
        CS["ContextSelector<br/>gemini-2.0-flash"]

        subgraph SelectionPrompts ["Selection Prompts"]
            SysPrompt["system.txt<br/>Selection criteria"]
            UserPrompt["user.txt<br/>Formatted with inputs"]
        end

        LLMCall["LLM Analysis<br/>Pydantic-AI Agent"]
        SelectionResult["ContextSelectionResponse<br/>• selected_contexts: List of strings<br/>• reasoning: str"]

        CS --> SelectionPrompts
        SelectionPrompts --> LLMCall
        LLMCall --> SelectionResult
    end

    PL --> CS

    subgraph ContextLibrary ["Available Contexts (37 options)"]
        BehaviorPatterns["Behavior Patterns<br/>• movement<br/>• flocking<br/>• interaction<br/>• temporal<br/>• boundaries"]
        VisualPatterns["Visual Patterns<br/>• drawing<br/>• drawing_api<br/>• emergent"]
        AlifePatterns["A-Life Patterns<br/>• cellular<br/>• alife_patterns<br/>• evolution<br/>• ecosystem<br/>• swarm"]
        VeraPatterns["Tölvera Patterns<br/>• vera_patterns<br/>• vera_interactions<br/>• species_interactions"]
        TechPatterns["Technical Patterns<br/>• taichi_crashes<br/>• initialization<br/>• configuration<br/>• temporal_updates"]
    end

    SelectionResult -.-> BehaviorPatterns
    SelectionResult -.-> VisualPatterns
    SelectionResult -.-> AlifePatterns
    SelectionResult -.-> VeraPatterns
    SelectionResult -.-> TechPatterns

    subgraph DynamicLoading ["Dynamic Context Loading"]
        ImportPatterns["_import_context_patterns()<br/>Dynamic Import System"]

        subgraph LoadingMethods ["Loading Methods"]
            DirectImport["Direct Import<br/>getattr(module, attr)"]
            SectionLoad["Section Loading<br/>load_section(file, section)"]
        end

        ContextMapping["Context-to-Import Mapping<br/>37 entries with module paths"]
        LoadedPatterns["Loaded Pattern Content<br/>Dictionary of context_name to content"]

        ImportPatterns --> ContextMapping
        ContextMapping --> LoadingMethods
        LoadingMethods --> LoadedPatterns
    end

    SelectionResult --> ImportPatterns

    subgraph ContextMerging ["Context Merging & Prompt Building"]
        PromptSections["Prompt Sections Assembly"]

        subgraph FinalPrompt ["Final Prompt Structure"]
            BaseSection["1. BASE CONTEXT<br/>Core APIs (always included)"]
            FiveElement["2. Five-Element Structure<br/>synthesis/five_element_structure.txt"]
            SuppSection["3. SUPPLEMENTARY CONTEXT<br/>LLM-selected patterns"]
            StateSection["4. Available States<br/>Formatted state context"]
            TaskSection["5. TASK SPECIFICATION<br/>Requirements & rules"]
        end

        PromptSections --> BaseSection
        PromptSections --> FiveElement
        PromptSections --> SuppSection
        PromptSections --> StateSection
        PromptSections --> TaskSection
    end

    CoreAPIs --> PromptSections
    LoadedPatterns --> PromptSections

    BaseSection --> MergedPrompt
    FiveElement --> MergedPrompt
    SuppSection --> MergedPrompt
    StateSection --> MergedPrompt
    TaskSection --> MergedPrompt

    MergedPrompt --> CodeGen

    class UserDesc,ExpertType,AddlContext inputNode
    class PL,PromptSections promptNode
    class CS,LLMCall,SelectionResult,SysPrompt,UserPrompt llmNode
    class BaseAPI,CoreAPIs,BehaviorPatterns,VisualPatterns,AlifePatterns,VeraPatterns,TechPatterns,LoadedPatterns contextNode
    class ImportPatterns,DirectImport,SectionLoad,ContextMapping loaderNode
    class MergedPrompt,CodeGen outputNode
    class BaseSection,FiveElement,SuppSection,StateSection,TaskSection outputNode

Core Components Implemented

Core Orchestration Pipeline:

  • BehaviorOrchestrator: Central coordinator managing the entire synthesis workflow, delegates to specialized components
  • BehaviorAnalyzer: Decomposes complex behavior descriptions into implementable components using pattern matching and LLM analysis
  • CodeGenerator: Synthesizes Taichi expert functions through direct LLM generation with physics-aware prompts
  • StateManager: Dynamically creates and manages custom particle, pixels, and global states based on behavior requirements
  • SpeciesManager: Detects species mentions in descriptions and manages configurations

Context and Generation:

  • ContextAwarePromptBuilder: Constructs prompts with relevant Tölvera rules, Taichi constraints, and behavior patterns from context library
  • TemplateRenderer: Uses Jinja2 templates to assemble generated expert functions and kernels into complete, executable sketches
  • SketchRefiner: Applies architectural patterns and corrections using a two-step analysis and implementation process

Debugging and Transparency:

  • Comprehensive Tracing System: Captures every LLM call, prompt, response, and timing metric
  • Console Tracer: Real-time colored output showing synthesis progress
  • HTML Report Generator: Interactive reports with timeline visualization and accordion sections for showing the full prompt call and response
  • Mermaid Diagram Generator: Visual flow diagrams of the synthesis process

Key Features Implemented

1. Product of Programmatic Experts (PoE) Architecture

The breakthrough came in Week 5 when we pivoted from the farily rigid MoE to the PoE system. Instead of monolithic scripts, the system synthesizes small @ti.func expert functions that calculate specific forces. This solved several Taichi compilation errors through a two-step synthesis process: first generating experts individually, then regenerating the integration kernel with all experts are included.

2. Dynamic State Generation

Developed in Week 8, the system automatically analyzes behavior descriptions to identify required states. The StateManager detects when behaviors need custom states (like time_of_day for day/night cycles) and creates them with Taichi types and initialization patterns.

3. Behavior Decomposition and Species Management

Week 7 introduced the BehaviorAnalyzer (was Decomposer if you look through some of the older code) which breaks complex descriptions into atomic components. It uses LLM-assisted decomposition for the decomposition. The SpeciesManager analyzes descriptions to detect species mentions, extract relationships, and generate species-aware initialization patterns.

4. Interactive Textual UI

Built in Week 11, the terminal UI provides:

  • Sketch generation with syntax highlighting
  • Natural language refinement with diff highlighting showing exactly what changed
  • Interactive tutorial system with 8-step walkthrough
  • Model selection across providers (Gemini, Claude, OpenAI)

5. Error Recovery and Self-Healing (Week 12)

The system includes automatic error recovery through:

  • Repair button that analyzes crash logs and attempts fixes
  • Conversation memory maintaining context across refinements
  • Two-step refinement process (analysis then implementation)
  • Pattern-based error detection and correction

Example: Working Generated Expert

Here’s an expert function generated by the system from the description “particles are attracted to the center of the screen”:

@ti.func
def expert_attract_to_center(pos: ti.math.vec2, vel: ti.math.vec2, mass: ti.f32, species: ti.i32, particle_idx: ti.i32) -> ti.math.vec2:
    # Calculate screen center
    center = ti.math.vec2(tv.x / 2.0, tv.y / 2.0)
 
    # Vector from particle to center
    to_center = center - pos
    dist = to_center.norm()
 
    # Initialize force (CRITICAL: declare before conditionals)
    force = ti.math.vec2(0.0, 0.0)
 
    # Apply force if not too close to avoid singularity
    if dist > 1.0:
        direction = to_center / dist  # Normalize
        force = direction * 500.0 * mass  # Scale by mass
 
    # Single return at end (Taichi requirement)
    return force

This demonstrates some key aspects of the synthesis:

  • Proper function signature with all required parameters
  • Physics-aware calculations (mass scaling, singularity avoidance)
  • Taichi constraint compliance (single return statement)
  • Clear variable initialization before conditionals

Performance Metrics

I’ve tested this a lot and these are crude measurements, but they are realistic success rates depending on how complex of a statement you initially test with the system:

Success Rates (Based on Testing):

  • Basic physics (gravity, random movement): ~85%
  • Simple interactions (chase, flee): ~70%
  • Species detection: ~80%
  • Complex multi-component behaviors: ~40%
  • Cellular automata and a-life patterns: ~30%
  • Everything working together end-to-end: ~20%

3. The Current State of the Project

Milestones and Demos

Throughout the 12-week dev period, we hit the following milestones:

  • Week 5: First successful PoE synthesis - gravity, attraction to center, movement patterns
  • Week 6: Inter-particle interactions - repulsion, chasing, flocking behaviors
  • Week 7: Complex decomposed behaviors - “particles migrate to center but repel when close”
  • Week 8: Temporal states - day/night cycles with behavior changes
  • Week 9: Artificial life patterns - slime molds, boids
  • Week 12: GSoC Demo Video - Complete walkthrough of the Natural Language Interface

Video Demonstration

The demo video provides a walkthrough of the Natural Language Interface for Tölvera, and the final overview video gives a comprehensive summary of the entire GSoC project, demonstrating:

  • Live Synthesis: Real-time generation of particle behaviors from natural language descriptions
  • Textual UI in Action: The complete terminal interface with syntax highlighting and live preview
  • Error Recovery: How the system handles and recovers from synthesis failures
  • Refinement Process: Natural language refinement with visual diff tracking
  • End-to-End Workflow: From typing a description to running the generated simulation

Current Capabilities

I ended up removing the tv.llm module declaring so instead of being a part of the library itself, it’s within it’s own directory and can be used (or completely omitted) based on the user’s needs. The examples in ui_scripts provides a pipeline for synthesizing particle behaviors from natural language descriptions. Users can access the system through two primary interfaces:

Command-Line Demo (tolvera_llm_demo.py)

Showcases all major features including:

  • Basic particle physics with single expert synthesis
  • Complex behavior decomposition with multi-component behaviors
  • Visual effects (trails, glows, connections)
  • Multi-species ecosystem interactions
  • Automatic state generation for temporal behaviors
  • Artificial life pattern recognition

Textual User Interface (tolvera_textual_ui.py)

Provides an accessible interface featuring:

  • Type-and-generate workflow with no coding required
  • Real-time code preview with Python syntax highlighting
  • Natural language refinement with visual diff tracking
  • Automatic error recovery through the repair button
  • Multi-provider support with automatic detection
  • Interactive tutorial for new users

Generated Artifacts

Each synthesis produces:

  • Complete Tölvera sketch
  • HTML trace reports for debugging

4. Challenges & Lessons Learned

Bittersweet Analysis: Our System vs. Frontier Models

To help determine if any of this was really worth it, we ran a bittersweet test where we put our system up against the top frontier models to see if our overly complex engineering of Tölvera and Taichi syntax and custom curating was actually worth it.

We compared our system against frontier models (Gemini 2.5 Pro and Claude Opus) using identical prompts. The results showed that there is merit in our approach and it performs exceptionally better when compared with zero-shot prompting and moderately more successfully if you feed the entire Tölvera codebase into the context window for Gemini and Claude.

Note: We used gemini-2.0-flash throughout our experimentation. The reason was for 1. inference time and 2. price. The model we were using was a lot worse in benchmarks regarding coding and general reasoning ability and thus should’ve underperformed the better foundational models if our pipeline orchestration approach was not useful.

Test Case 1: Day/Night Cycle Behavior

Prompt: “Two species chase one another. One blue species moves faster during the day than at night. The orange species does the opposite.”

Our System:

  • Worked on first attempt
  • Day/night cycle is difficult to tell, but it’s operational
  • Species behaviors properly differentiated with colors

Zero-Shot

Gemini 2.5 Pro (Zero-Shot):

  • ❌ Initial generation failed with this: TypeError: Particles.__init__() takes 2 positional arguments but 3 were given
  • ❌ After refinement: AttributeError: 'int' object has no attribute 'pn'
  • ❌ After 2nd refinement: AttributeError: 'Particles' object has no attribute 's'
  • Never successfully runs despite multiple attempts

Claude Opus (Zero-Shot):

  • ❌ Initial generation failed with this: ImportError: cannot import name 'ui' from 'tolvera'
  • ❌ After refinement: ModuleNotFoundError: No module named 'tolvera.ui'
  • ❌ After 2nd refinement: ModuleNotFoundError: No module named 'imgui'
  • Fundamentally misunderstands Tölvera’s API structure
Full Tölvera Repo Loaded into the Context Window

Gemini 2.5 Pro (Full Tölvera Context):

  • ❌ Initial generation failed with this: taichi.lang.exception.TaichiSyntaxError: Taichi functions cannot be called from Python-scope.

After refinement:

Claude Opus (Full Tölvera Context):

  • ❌ Initial generation failed with this: Name "speed_multiplier" is not defined

After refinement:

Test Case 2: Food Competition

Prompt: “Two species, one maroon and one teal, are competing for food (green particles).”

Our System:

  • Works, though initial version lacks colors
  • After one refinement: correct colors and competition mechanics
  • Food particles properly consumed over time
Initial

After 1 Refinement

Gemini 2.5 Pro (With Context):

  • ⚠️ Works on first try but incorrect mechanics
  • After refinement: food particles attracted to species instead of being consumed
  • Fundamentally misunderstands the intended behavior
Initial

After 1 Refinement

Claude Opus (With Context):

  • TaichiSyntaxError: Taichi functions cannot be called from Python-scope
  • ❌ After refinement: Same error plus segmentation fault
  • ❌ Never successfully runs
  • Critical misunderstanding of Taichi’s execution model

Key Insights from Comparative Analysis

1. Domain-Specific Knowledge is Critical

Frontier models, despite their general capabilities, lack understanding of:

  • Tölvera’s specific API structure
  • Taichi’s GPU kernel constraints
  • The distinction between Python-scope and Taichi-scope execution

2. Structured Context Beats Raw Intelligence

Our system’s success comes from:

  • Curated context about Tölvera and Taichi
  • Common error patterns pre-identified and avoided
  • Structured output forcing valid code generation

3. Specialization Enables Reliability

While frontier models attempt to generate code from first principles, our specialized approach:

  • Prevents a lot of common crashes through context engineering
  • Understands the semantic meaning of particle behaviors
  • Generates appropriate state management automatically
  • Maintains consistency across the synthesis pipeline

Critical Technical Challenges

1. The Taichi Compilation Error (Week 5)

The challenge came when dynamically generated @ti.kernel functions couldn’t be located by Taichi’s compiler, resulting in “Cannot find source code for object” errors. The solution involved a two-step synthesis process: first generating @ti.func experts, then regenerating the entire integration kernel, with the complete code compiled and cached in Python’s linecache.

2. Taichi Constraint Violations

The most common crash (“Return inside non-static if”) affected even frontier models. Discovered in Week 5 and refined through Week 10, our solution:

# NEVER use return inside if/for/while blocks!
WRONG - CRASHES:
if species == 0:
    return predator_force()  # CRASH!
 
CORRECT:
force = ti.math.vec2(0.0, 0.0)  # Declare first
if species == 0:
    force = predator_force()  # Modify
return force  # Single return at end

This constraint alone caused approximately 40% of initial generation failures across all models.

2. State Consistency

Managing state names across different synthesis stages proved challenging. The LLM would often generate different names for the same state in expert functions versus integration kernels. The solution involved validation and context engineering to resolve. Also the more parameters in the model, the less it hallucinated these overall.

3. Context Management

The initial keyword-based context selection was too simplistic. Complex behaviors required multiple context modules, but including too much context degraded generation quality. Finding the right balance remains an ongoing challenge.

Architectural Evolution Through Iterative Development

The project’s 12-week journey essentially summed up a lot of the problems you’d find in the literature regarding LLM-based code generation for DSL:

Weeks 1-4: Finding the Right Abstraction

  • Week 1: Evaluated Pydantic vs TypeChat, chose post-hoc validation
  • Week 3-4: MoE architecture proved too rigid with deterministic experts

Weeks 5-7: The PoE Breakthrough

  • Week 5: Product of Experts solved Taichi compilation issues
  • Week 6: Added inter-particle interactions and error correction
  • Week 7: Decomposition enabled complex behavior handling

Weeks 8-10: Balancing Structure and Flexibility

  • Week 8: Jinja2 templates for sketch structure with state generation
  • Week 9: Expanded to artificial life patterns beyond forces
  • Week 10: Direct synthesis for expert functions with Gemini proved more flexible

Weeks 11-12: User Experience and Reliability

  • Week 11: Full Textual UI for accessibility
  • Week 12: Self-healing with repair button and memory

This evolution demonstrated that constraining LLMs too rigidly limits their capabilities, while complete freedom leads to too many errors. The sweet spot involves detailed prompts with Tölvera and Taichi rules and examples, combined with error detection and recovery. As noted in Week 10, “it’s still pretty brittle for complex stuff” but “when it works, it’s very nice 😊“.

5. What’s Left to Do (Future Work)

Feature Extensions

Expanding Module Support: While the current system focuses on tv.vera particle behaviors, the architecture is ready for:

  • tv.osc: OSC communication for external control
  • tv.cv: Computer vision integration
  • tv.iml: Interactive machine learning pipelines
  • tv.mp: MediaPipe tracking integration

Advanced Synthesis Capabilities:

  • Multi-behavior composition and blending
  • Evolutionary parameter optimization
  • Real-time behavior modification during execution (live-coding examples)
  • Export experts and kernels to standalone executables and learn from these generations

Research Directions

Improved Success Rates:

  • Fine-tuning models specifically for Taichi code generation (but can be costly for data gathering)
  • Creating test suites for automatic validation
  • Developing better heuristics for behavior decomposition

User Experience Enhancements:

  • Visual node-based editor showing synthesis pipeline (this is a whole idea we threw around, but this is like a 4 month project overall 😅)
  • Preset library of common behaviors
  • Community sharing of generated sketches

Conclusion

The Tölvera LLM Engine transforms natural language descriptions into executable particle simulations, achieving what frontier models struggle with through specialized knowledge and targeted context engineering. The comparative analysis demonstrates that domain-specific systems can significantly outperform general-purpose models on specialized tasks, with our system achieving 60-85% success rates compared to near-zero success from frontier models on identical prompts (without significant refinements).

The project evolved from a simple proof-of-concept into a multi-agent system with debugging capabilities, an intuitive user interface, and automatic error recovery. The journey highlighted both the potential and limitations of current LLM technology for DSL code generation, leading to some practical solutions that balance automation with reliability.

When a user types “red predators chase blue prey,” and watches the system decompose and synthesize the behavior, and sees particles spring to life following their description while frontier models fail with basic API errors, the value of specialized systems becomes clear while using this type of DSL code generation.

This work lowers the technical barriers to artificial life simulation while maintaining the flexibility and power that makes Tölvera compelling for both artists and researchers. The foundation is now in place for continued development, with paths forward for improving reliability, expanding capabilities, and building a community around accessible artificial life creation via this pipeline.


Project Documentation and Resources

Weekly Development Journals

Follow the complete development journey through the weekly journals:

Code and Resources


Special thanks to the Tölvera community and GSoC mentors (Jack, Victor, and Piotr) for their support throughout this project.