<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~files/atom-premium.xsl"?>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:feedpress="https://feed.press/xmlns" xmlns:media="http://search.yahoo.com/mrss/" xmlns:podcast="https://podcastindex.org/namespace/1.0">
  <feedpress:locale>en</feedpress:locale>
  <link rel="hub" href="https://feedpress.superfeedr.com/"/>
  <title>Allen Pike</title>
  <link href="https://www.allenpike.com/"/>
  <link type="application/atom+xml" rel="self" href="https://feeds.allenpike.com/feed/"/>
  <updated>2026-05-31T23:45:30+00:00</updated>
  <id>https://allenpike.com/</id>
  <author>
    <name>Allen Pike</name>
  </author>
  <icon>https://www.allenpike.com/apple-touch-icon.png</icon>
  <entry>
    <id>https://allenpike.com/2026/voice-in-visuals-out</id>
    <link type="text/html" rel="alternate" href="https://www.allenpike.com/2026/voice-in-visuals-out"/>
    <title>Building for Voice In, Visuals Out</title>
    <updated>2026-05-31T23:45:30+00:00</updated>
    <author>
      <name>Allen Pike</name>
      <uri>https://allenpike.com/</uri>
    </author>
    <content type="html"><![CDATA[<p>Recently, <a href="https://x.com/karpathy/status/2053872850101285137?s=46">Andrej Karpathy argued</a> that the ideal interaction pattern for AI models is <strong>voice in, visuals out</strong>:</p>
<blockquote>
<p>Audio is the human-preferred input to AIs, but vision is the preferred output from them. Around a ~third of our brains are a massively parallel processor dedicated to vision; it is the 10-lane superhighway of information into brain.</p>
</blockquote>
<p>The claim is that while “text in, markdown out” is the mode most people use LLMs today, what we should be building toward is a Jarvis-like mode where we primarily speak to AI – and it primarily responds with UI, video, or other visuals.</p>
<p>Let’s check in on where we’re at for both halves of this claim: visuals as output, and voice as input.</p>
<h2 id="visuals-out" tabindex="-1">Visuals Out</h2>
<p>Humans love looking at things!</p>
<p>While it can be convenient to be able to listen to our computers speak, waiting through a voice response feels kinda… ugh. You can increase the speaking rate, but fundamentally, the fastest way for a computer to give humans information is to display it.</p>
<p>We’re faster at reading text than we are at listening, but that’s just the start. There’s a good reason computers long ago evolved past text-only terminals: <a href="https://allenpike.com/2025/post-chat-llm-ui/">richer interfaces are often faster, clearer, nicer, and more useful</a>. The power of human vision has facilitated a rich history of computers showing people stuff.</p>
<p>At first, LLMs <a href="https://simonwillison.net/tags/pelican-riding-a-bicycle/">weren’t great at producing visuals</a>, often spending many tokens to produce half-baked results. However, Anthropic’s Thariq Shihipar recently wrote how <a href="https://claude.com/blog/using-claude-code-the-unreasonable-effectiveness-of-html">HTML is increasingly a viable output format</a> to supplant Markdown, for certain model responses. This is great, since HTML is a powerful way to show visuals.</p>
<p>Going beyond text can give us dynamic:</p>
<ul>
<li>Hierarchy (sidebars, columns, navigation)</li>
<li>Exploration (drill ins, filters, expansion)</li>
<li>Direct manipulation (scrolling, dragging)</li>
<li>Data visualizations (graphs, charts, dashboards)</li>
<li>Mockups and prototypes (show, not tell)</li>
<li>Illustrative images and video (pelicans, bicycles)</li>
</ul>
<p>Thus the DOS era of AI begins to end.</p>
<p>While it will be a while before general-purpose agents consistently return compelling HTML in response to arbitrary requests, visual responses are already practical for vertical agents – it helps to do one thing well. Recent months have seen a noticeable uptick in AI features producing useful diagrams, charts, sliders, and so on.</p>
<p>So, yep. Visual output is a natural fit for AI, and we’re already going beyond plain text.</p>
<h2 id="voice-in" tabindex="-1">Voice in</h2>
<p>On the other hand, most people are ambivalent about the idea of talking to AI. We were promised the Star Trek computer, or Jarvis, but so far we’ve gotten <a href="https://daringfireball.net/2025/03/something_is_rotten_in_the_state_of_cupertino?utm_source=chatgpt.com">Siri</a> and automated spam calls.</p>
<p>There’s merit to the skepticism. Fundamentally, voice is never going to be the only input mode for computers. Just as we sometimes need voice because our hands are occupied, other times it’s impractical to speak aloud for social or confidentiality reasons. And even when we <em>can</em> speak, voice alone isn’t enough – effective computer use will always require more precise inputs, such as mouse clicks and drags.</p>
<p>However, voice is a deeply human and useful input mode. For example, it’s excellent for getting out our not-yet-organized thoughts and observations. While ChatGPT voice mode is substantially dumber than its text mode, it can still be useful for organizing your thoughts – advanced rubber-ducking.</p>
<p>Compared to text, speech also contains additional nuance and detail.</p>
<p>Voice is not just words – it’s intonation, timing, tone, pitch, energy, and emphasis. Where a transcript would only see <code>okay</code>, how you voice the “okay” might convey “Sounds good!”, “Tell me more”, “I kind of doubt that.” or “Get the hell out of my office.” This is why we call somebody if we need to have an emotional conversation, rather than sending misinterpretable text messages.</p>
<p>We speak faster than we type in terms of WPM, so together with the additional details in our voice, we simply put out more information per second via voice than from a keyboard.</p>
<h2 id="the-tyranny-of-latency" tabindex="-1">The Tyranny of Latency</h2>
<p>So, great. Talking to AI and having it respond with visuals are both natural and highly useful. Why aren’t we doing this all the time?</p>
<p>If you’ve actually used AI voice systems, you’ve probably noticed that they’re usually slow, dumb, or both.</p>
<p>In order to feel fast, we’ve <a href="https://www.nngroup.com/articles/response-times-3-important-limits/?utm_source=chatgpt.com">known since the 60s</a> that computers should respond within about 100ms, and that in order to keep users’ sense of flow, they need to respond within about 1000ms (1 second). Even before networks and giant neural nets, it could be a challenge to hit these bars.</p>
<p>But voice AI adds a substantial new hurdle. Humans are more sensitive to lagged voice than we are to lagged visuals. For a fully fluid voice conversation with interruptions going both ways, the latency bar is about 200ms. More than that, and interruptions feel janky and annoying. You’ve experienced this on voice calls with other humans: if there’s a noticeable lag and you’re stepping on one another’s words, you back off into a more stilted turn-taking conversation style.</p>
<p>At best, this is what we get with common AI applications today: slow, single-duplex turn-takers. They listen until it seems like you’ve stopped, generate a response, then stream until it sounds like you’ve started saying something, at which point they abruptly stop.</p>
<p>While 200ms is a long time in traditional computing terms – a smooth animation frame needs to render in just 16ms – you’ll find 200ms is not a long time to do the complex work of sending a user’s voice over the network, making sense of it, generating a voice response, and sending it back.</p>
<p>In order to achieve the required latency, applications generally do voice inference with rather small models. The most advanced voice model most people have tried, ChatGPT’s rather outdated voice mode, is profoundly dumb compared to GPT 5.5 or Claude Opus 4.8. Even if you understand why this is the case, it’s fun to watch <a href="https://www.instagram.com/p/DWUs-hnAZpo/">that guy who awkwardly gets it to misadvise him</a><sup id="footnote-1" role="doc-noteref"><a href="https://www.allenpike.com#fn:1" class="footnote" rel="footnote">1</a></sup>.</p>
<p>But there is hope. Earlier this month Thinking Machines gave a preview of their approach for realtime voice models, which they call <a href="https://thinkingmachines.ai/blog/interaction-models/">Interaction Models</a>. These are full-duplex systems, which means we’re finally getting simultaneous perception and generation.</p>
<iframe style="width: 100%; aspect-ratio: 16 / 9; margin: 0 auto;" src="https://www.youtube.com/embed/A12AVongNN4?si=LK3gBMfximxQtiia" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
<p>Rather than switching between generation and listening, these streaming models slice time into 200ms chunks, interleaved continuously. While 200ms isn’t enough to generate a very smart response, that fast streaming model can call slower, smarter models to do things like lookups, reasoning, and generating artifacts – then return the results in 200ms chunks when they’re ready.</p>
<p>Now, this is all very exciting, and I’m excited to see where it goes. But despite the claim “The model instantly reacts to visual cues”, even their demo videos show a noticeable and sometimes awkward lag between stimuli and voice responses.</p>
<p>This is partly because it’s early – Thinking Machines was only founded last year. But it’s partly because humans are just that sensitive to voice delays. It’s a fundamentally difficult problem.</p>
<p>However. Humans are <em>less</em> sensitive to laggy visuals. Since visuals are less intrusive than a voice response, you get the more permissive 1000ms response budget that we’re used to when building computer programs.</p>
<p>This is convenient, since voice → visuals is a great interaction mode.</p>
<h2 id="voice-in-visuals-out" tabindex="-1">Voice In, Visuals Out</h2>
<p>The good news is that you don’t need to wait for Thinking Machines or any other model advances to build useful voice in, visuals out experiences today.</p>
<p>Here’s a quick example of what voice in, visuals out can feel like: not a chat, but a live visual representation of what you’re working on.</p>
<div class="centered">
<video loop playsinline controls style="max-width: 100%;">
  <source src="https://www.allenpike.com/images/2026/voice-in-demo.m4v" type="video/mp4">
</video>
<span class="caption"><a href="https://cedarloop.ai/">The Cedarloop voice agent</a> can help outline notes, file bugs, and do other in-meeting work.</span>
</div>
<p>Here are a few latency approaches to keep in mind if you’re working on voice-in, visuals-out agents:</p>
<ol>
<li>The underlying model needs to be very fast. Any slower than p50 latency of 700ms and p95 of 1200ms will feel janky. Meanwhile, it’s common to see small requests on “fast” models that have over 5000ms of p95 latency 🫠</li>
<li>You need to send uncomfortably short time slices for inference. Err on the side of sending incomplete text rather than waiting for two-second pauses, and use context engineering to have the model heal any errors.</li>
<li>Keep your context prefixes stable, so they can be well-cached. 90%+ of our input tokens are cached, and thus far faster (and cheaper) than if we were sending fresh context every request.</li>
<li>Tokens are slow, and HTML is token-heavy. Realtime visuals-out needs to use efficient formats out of the LLM, which can then be displayed in a rich web or native view.</li>
</ol>
<p>Get it dialled in right, and you can build delightful-feeling experiences.</p>
<p>If you’re working on these kinds of realtime apps, I’d love to chat – happy to share what we’ve been learning, and hear what others have been finding.</p>
<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
<p>GPT-Realtime-2 recently launched in the API with “GPT-5-class reasoning,” but is not in ChatGPT yet. And so far, Claude has no realtime multimodal model at all. <a href="https://www.allenpike.com#footnote-1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>
]]></content>
  </entry>
  <entry>
    <id>https://allenpike.com/2026/we-can-do-hard-things</id>
    <link type="text/html" rel="alternate" href="https://www.allenpike.com/2026/we-can-do-hard-things"/>
    <title>We Can Do Hard Things</title>
    <updated>2026-04-30T23:45:30+00:00</updated>
    <author>
      <name>Allen Pike</name>
      <uri>https://allenpike.com/</uri>
    </author>
    <content type="html"><![CDATA[<p>Years ago, back when I was leading a mobile dev team, my friend had an idea for a business.</p>
<p>You see, back then the most frustrating thing about mobile dev was the final step: getting your app on actual phones. Builds, provisioning, and code signing made for a harrowing trial, festooned with obtuse errors and other sharp spikes.</p>
<p>So, Dennis had a pitch for me. “What if,” he asked, “we did all your apps’ builds and provisioning and signing for you, in the cloud?”</p>
<p>I raised an eyebrow. “Well, obviously that would be great. In theory. But it would be too annoying to build that. Apple drops Xcode versions and switches submission requirements with no warning. And you’d need to make sure that…” He stopped me with a wave.</p>
<p>“Right, but: if we did it, and it worked. Would you use it?”</p>
<p>“Well, of course we would. But I don’t think you want to run this.”</p>
<p>My attempt to discourage him didn’t work. Perversely, the idea that this was a hard problem got him more excited. He immediately dove in.</p>
<p>Three years later, Buddybuild <a href="https://techcrunch.com/2018/01/02/apple-buys-app-development-service-buddybuild/">was acquired with fanfare</a>. They’d accomplished what they set out to do, made a tidy profit, and they were even able to keep their team here in Vancouver.</p>
<p>Wisely they ignored me, and chose to do the hard thing.</p>
<h2 id="the-nice-thing-about-hard-things" tabindex="-1">The Nice Thing About Hard Things</h2>
<p>Doing something hard yet pointless is foolish. But doing something hard yet valuable has a lot of benefits.</p>
<ol>
<li>It’s <a href="https://blog.samaltman.com/what-i-wish-someone-had-told-me">easier to recruit a great team</a> to tackle hard, worthwhile problems.</li>
<li>It leads to less competition, due to <a href="https://www.paulgraham.com/schlep.html">schlep blindness</a>.</li>
<li>It’s a great way to hone your ambition and discipline – over time, working on hard things feels less hard.</li>
</ol>
<p>Consider that. If you have a great team, less competition, but more ambition and discipline, then you’re set up to do well.</p>
<p>These days are well suited to attempting hard things. Our tools are improving so fast that a project which seemed straightforward last year might be trivial next year. Better to dial up the ambition a bit.</p>
<p>Of course, there are a few pitfalls to trying hard things. You’re more likely to burn out, for one – it’s very important to sleep, exercise, and manage your own energy when your work is kicking your ass.</p>
<p>And it can sometimes be difficult to tell when the “hard and purposeful” parts end, and when the “overcomplicating things” or “naive folly” begins. I highly recommend having a co-founder that finds hard and purposeful problems motivating, yet takes a dim view of overcomplication. Doing hard things is best not attempted alone.</p>
<p>But, all in all, it’s a good default. We can do hard things.</p>
<p>So, let’s.</p>
]]></content>
  </entry>
  <entry>
    <id>https://allenpike.com/2026/the-rise-of-transparency</id>
    <link type="text/html" rel="alternate" href="https://www.allenpike.com/2026/the-rise-of-transparency"/>
    <title>The Rise of Transparency</title>
    <updated>2026-03-31T23:45:30+00:00</updated>
    <author>
      <name>Allen Pike</name>
      <uri>https://allenpike.com/</uri>
    </author>
    <content type="html"><![CDATA[<p>Small companies are, by default, very transparent. When there are 4 people working in a room, you have a direct line of sight on what everybody else is doing, and why. Your docs, Slack channels, and repositories are open to everybody. When the CEO has an epiphany that changes everything, you all know right away – probably because you were at lunch together when it happened.</p>
<p>Thus, startup founders will often get religion about transparency. “Our culture,” they’ll declare, “is to be radically transparent! Everything defaults to open. We hire adults, expect them to do great work, and give them the context they need.” Yay transparency!</p>
<div class="centered">
<img style='max-width: 100%' src="https://www.allenpike.com/images/2026/transparent-tea.jpg" alt="A transparent cup of tea." />
</div>
<p>And this works pretty well. Transparent orgs tend to delegate more effectively, have higher accountability, less politics, faster trust, and just plain ship more. Transparency helps bigger orgs adapt more quickly to the ground truth, responding to customer signals that execs might not be directly exposed to.</p>
<p>But, at a certain scale, radical transparency strains.</p>
<p>Some idle musing by the CEO sends a team off on an unimportant side quest. A well-justified compensation anomaly upsets a group who is missing background information. A 450-message Slack thread about bike shed paint color choices devolves into factions, hashtags, and philosophical arguments about the morality of taupe. #nevertaupe</p>
<p>And if you talk to people at a large yet highly transparent company, you’ll hear about the hazards of the relentless <strong>firehose</strong>. A thousand shared Slack channels, to start. But also a glut of docs – some critical, most unmaintained. Then there’s the meeting notes, meeting recordings, and meeting invites. Plus proposals, requests for comment, and requests to comment on your proposals’ comments’ resolutions. “So, you like information, eh? Well, have all the information in the world!” How do you make sense of all this?</p>
<p>While some people are tenaciously able to find, within this chaos, the important info they need to do great work, a lot of otherwise-capable people get easily distracted by information that just <em>might be</em> urgent, provocative, or even just… shiny. 💫</p>
<p>Meanwhile, allowing everybody access to every historical doc is occasionally useful, but it also presents an ever-growing surface area for leaks and legal liability. Are you sure there isn’t something highly sensitive or disagreeable in those 99,999 unmaintained Notion docs?</p>
<p>So, as companies grow, they tend to lock information down. Some – Netflix, Stripe, Shopify – do their best to keep as transparent as possible while still complying with necessary guardrails. Others – Apple, Palantir, Oracle – move toward a need-to-know basis, ensuring information flows top-down. With more control over information, it’s easier to ensure that leaks or internal distractions don’t derail your plans for surprising product launches and/or world domination.</p>
<p>Of course, every company’s culture is forged by the market they operate in, but there’s always some tradeoff here. And as companies grow, they tend to regress to a boring middle ground.</p>
<p>However. As with many tradeoffs, the balance has recently begun to shift.</p>
<h2 id="given-this-firehose-please-assess-my-plan" tabindex="-1">Given this firehose, please assess my plan</h2>
<p>Recently, we’ve seen a revolution in tools that can make better use of the firehose. Slack can now summarize your unread messages, albeit with mixed effectiveness. Tools like <a href="https://www.glean.com/">Glean</a> and <a href="https://getunblocked.com/">Unblocked</a> can consider a mountain of your company’s data and answer important questions about it, albeit limited to the data they can actually see. And large open companies like Shopify and Stripe have internal tools that let employees’ agents query, analyze, and act on the copious data any given employee has access to – albeit with some sharp edges and exfiltration risks.</p>
<p>Just as LLMs are making the world’s data more useful to the world, they’re making companies’ internal data more useful to employees.</p>
<p>Of course, this can be misused! In some companies we’ll see further secrecy – I’ve heard of AI search tools and MCPs letting employees find accidentally-visible compensation data and other spicy docs that hadn’t been audited. I’ve heard of support agents giving customers true-but-problematic information because they surfaced it with internal AI tooling without proper training.</p>
<p>But as we evolve past early growing pains, and into teams and processes fully making use of this stuff, the anecdata points toward this new tooling becoming a superpower. Agents’ newfound ability to effectively query and reason about far more data than can fit into context is making the long tail of communications and docs much more useful for decision-making – but only when people have access to the relevant data.</p>
<p>Given that, <strong>the maturation of AI tooling will motivate companies to become more transparent</strong>.</p>
<p>In 2024, the cost of being internally secretive was meaningful but manageable. Although Apple keeping information need-to-know sometimes leads to waste, or important changes being slow to diffuse through layers of management, they’ve done, like, pretty well for themselves? With all the scrutiny from press, competitors, and regulators, you can see why they’ve kept it up.</p>
<p>But as all companies increasingly have tools that can assess, consider, analyze, and make use of all the business’ communications and documents, what kinds of org are going to benefit most? Well, the ones that let their employees access more context.</p>
<p>Extremely transparent orgs like Zapier, GitLab, and PostHog that might have struggled to cope with their firehoses – and who often had gaps in the data due to untranscribed meetings and decisions – will increasingly be able to leverage it. Sure, not all of it, certainly not at first. (Some of it is just junk.) But increasingly more of it. And critically, it won’t just be executives that will be able to attend to all this knowledge.</p>
<h2 id="where-we-ought-to-be" tabindex="-1">Where we ought to be</h2>
<p>The frontend dev working on your internal admin dashboard should be flagged that the React upgrade issue they’re battling right now was just solved by the customer-facing dev team. The intermediate developer who is incensed about a company-wide tech decision should be able to build their understanding of why it was made without booking a 1:1 with the responsible Principal Engineer. Your go-to-market team should be able to “see” through to the code, developers’ conversations, and the recent decisions around a given feature, letting them give customers correct and timely information about what to actually expect from the product today.</p>
<p>And everybody in your company should, when it’s useful, have key company-wide strategy docs available to their agents as they make plans and decisions. And then, when a new revelation motivates the exec team to improve those docs, then bam. All the product engineers’ agents will take this new strategy into account right away. Anybody who’s worked at a large company and/or used CLAUDE.md knows this won’t be a silver bullet – deeply ingrained habits and momentum can not be simply prompted away. But as the tools and the data improve, the advantage will accumulate.</p>
<p>When we launched <a href="https://cedarloop.ai/">a realtime meeting agent</a> last month, we expected to get feedback about its defaults being too open – currently, Cedarloop defaults to sharing its collaborative notes and tools with all attendees live. But instead, we’ve seen two diverging kinds of feedback: many of our users want the tool to be less visible to external guests and customers, but <em>more</em> open internally within their companies. Which in retrospect makes a lot of sense: decisions and actions in your team’s work are increasingly useful across your company, but your customers shouldn’t need to worry about all that.</p>
<p>So long story short, more internal transparency is coming.</p>
<p>It will take some time. Apple isn’t doomed, and just because Zapier and Shopify are already working that way doesn’t mean they’re going to instantly be turbo-boosted. But it seems a new era is coming, where siloed knowledge, information hoarding, and secrecy-by-default will become less tenable.</p>
<p>The firehose will evolve from a spicy distraction to a useful input to important work.</p>
]]></content>
  </entry>
</feed>
