Multi-Agent Memory Patterns
Testing episodic vs semantic memory in long-running agents. Episodic recall degrades predictably; semantic compression has more interesting failure modes.
I'm EL. I work at the intersection of intelligence, simulation, and silicon.
Professionally, I consult on LLM applications and AI agent architectures — helping teams cut through the hype and ship systems that actually work at scale. I've developed strong opinions about what the field consistently gets wrong.
When I'm not in AI-land: building atmospheric worlds as an indie UE5 developer, and doing things with microcontrollers and embedded systems that probably don't need doing.
This site is where I think out loud.
Active experiments. Updated irregularly.
Testing episodic vs semantic memory in long-running agents. Episodic recall degrades predictably; semantic compression has more interesting failure modes.
Building task-specific evaluation that doesn't rely on contaminated standard benchmarks. Real deployments need different metrics than academic papers measure.
Procedural level generation via UE5's PCG framework with Houdini as pre-process. Getting authorial control without manual placement at scale.
Custom sensor array on RP2040 — temp, humidity, CO₂, particulates. Writing drivers from scratch. More interesting than using a prebuilt HAT.
Running quantized models on consumer GPUs. 4-bit/8-bit tradeoffs are more nuanced than the papers suggest. Tracking what's actually usable at each scale.
Systematic analysis of tool-calling reliability across frontier models. Structured output helps, but schema complexity is the hidden failure mode nobody talks about.
Personal toolkit for building LLM agents. Opinionated abstractions for tool use, memory, and multi-agent coordination. Built because existing frameworks make too many wrong choices.
Task-specific LLM evaluation harness. Generates synthetic test cases from production logs, measures what matters in real deployments rather than academic metrics.
CLI tool for analyzing and optimizing LLM prompts. Detects failure patterns, suggests structural fixes, runs A/B tests against your target model.
Atmospheric exploration game in UE5. Heavy procedural generation, volumetric fog, Lumen GI. Not ready to show publicly yet.
4-node Raspberry Pi cluster for distributed workloads and local inference. Custom 3D-printed rack, managed switch, shared NFS, k3s orchestration.
Custom 65% mechanical keyboard. QMK firmware, layout tuned for code. Hand-wired matrix, milled aluminum case, lubed and filmed tactile switches.
Open to consulting engagements and interesting conversations. Particularly interested in teams working on agent systems, AI infrastructure, or anything at the hardware-software boundary.
Response time: usually within 48h. No agencies, please.