Road to Artificia
Posts
The Hypermodal Interface

The Hypermodal Interface

A few small teams are pioneering the next generation interface for computing, starting on the desktop, where permissionless innovation reigns

Jeff LaPorte
April 03, 2025 • Estimated Reading Time: 9 minutes

Did you receive this forwarded from a friend?

THE HYPERMODAL INTERFACE

It’s not quite an app, not quite a chatbot, and not quite a tool. But if you’ve used Highlight, EnConvo, or 5ire, you’ve already touched it.

There's a new desktop interface about to emerge into broader awareness, and it’s poised to deliver on the kind of productivity improvements promised, and flubbed, by Microsoft’s Copilot and its ilk.

It's being pioneered by a few small teams that are bundling a set of market-ready AI technologies and interface ideas that are converging on something compelling. There have been piles of forgettable desktop apps that just wrap the various chatbot services -- these new apps are not that.

I'm speaking about the class of desktop add-ons like Highlight, EnConvo, 5ire, and Sage. All are headed toward a vision of a renewed high-productivity "PC" interface that as yet has no name. I call it the Hypermodal Interface.

What is the Hypermodal Interface?

Describing a new interface paradigm is difficult. To really grok it, you need to experience it. Like the GUI or the touchscreen before it, the Hypermodal Interface only becomes "obvious" in hindsight.

It blends new AI capabilities with ultra-fast triggering mechanisms. They don’t just enhance your apps—they start to replace direct interaction with them altogether. It’s “Hypermodal” because it goes far beyond the traditional idea of multimodal interaction: text, speech, screen content. It incorporates these and many more modalities as we’ll see.

If you’ve ever used macOS Spotlight, Alfred, or Raycast, you’ll recognize the conceptual lineage: keyboard-triggered, cross-app command interfaces. Each in turn pioneered and extended the "quick shortcut access to search and action triggering" UX. But those tools had no intelligence, no voice control, limited integrations. Every shortcut had to be predefined, every workflow specified in advance.

macOS Spotlight, a precursor

Screenshot of Raycast, a precursor to the AI Hypermodal Interface

Raycast, another precursor

But the sorts of tasks that you fire off to a truly useful assistant, human or otherwise, are intended to be maximum-leverage: A very short task description, with a time consuming or tedious work assignment off your hands. And further, a good assistant needs to be a generalist, handling a mass of slightly differently defined tasks that don't lend themselves to definition as an automation, because each may never be repeated exactly the same way. Only the most complex and domain specific tasks are worth the extended effort of specifying methods and outcomes.

With the Hypermodal Interface, you push intent into a detailed, shared context—a phrase, a voice command, a keystroke—and delegate the task. The intelligence, and the flexibility, lives in the system. The context comes from awareness of the active app, memory of your previous interactions, and understanding of what’s on your screen.

The stack is conceptually simple, but has a couple components of note.

The interface: quick triggered, well-crafted UX and desktop-environment code integration, with text input, voice input and availability in app-context and out of app-context
An agent-capable, multimodal LLM (like Anthropic Sonnet 3.7)
Visual context via screen capture + interpretation
Model Context Protocol (MCP), a plugin-like interface to a fast-growing ecosystem of tools

Let's talk more about MCP - it's a critical enabler and very recent development.

Why Now? Model Context Protocol (MCP)

MCP’s reason for being

MCP was introduced last November, and it has generated a lot of confusion since then. This confusion mostly revolves around the question "isn't this just an API? why can't we just use existing APIs?"

MCP seems destined for the fate of the USB device standard - terrible naming, and big success despite it. Ironically their purposes are similar: USB was introduced to enable “plug-and-play” — the idea that devices should work without additional setup when you plug them in. MCP is a standard to enable plug and play Tools for use by LLMs.

For non-technical folks, the terminology "MCP server" can be confusing, so I will simply call MCP servers "MCP tool providers".

MCP enables rapid innovation by decoupling the pieces of an AI application environment (the LLM and the tools) in a way that makes it easy for developers to ship features that are compatible with the whole ecosystem. In software engineering we call this dependency inversion. It’s the idea that different pieces of software shouldn’t depend on each other, they should both depend on a third thing (an interface spec).

As a result we've got a slew of ideas around these new desktop interfaces, shipped as MCP clients (let’s call them MCP tool users), that work with the whole MCP server ecosystem.

MCP is fundamentally a really simple specification. The key reason it is driving an explosion of clients and tools, where ChatGPT plugins did not, is the extreme simplicity of shipping an MCP server, the ability for the MCP client to decouple LLM tools from the LLM.

Why has MCP driven an explosion of clients and tools, where ChatGPT plugins did not?

The extreme ease of shipping MCP servers (LLM tools)
- No need to do authentication through OAuth to make user-data
- Basically ChatGPT plugins were poorly defined if the goal is to leverage the energy of open source developers
- No app store-like approval necessary
MCP decouples tools from the LLM, and this makes every MCP tool more valuable.
- It can be integrated with any LLM with reliable tool-calling capability
- It can be used in all kinds of contexts that the tool author may not have even thought of - the MCP client defines the user interface

Why is it emerging on the desktop, not on mobile?

Even though the "PC" has long ago become the junior partner to the smartphone, it's fascinating to watch the development of the next interface modality on the PC, due to its allowance for "permissionless innovation".

As I wrote in "In the era of AI agents, Apple keeps agency for itself", in AI platforms the AI agent is the application, and Apple's design for the Apple Intelligence framework locks developers out of creating agents. The cost of that decision is beginning to make itself visible.

We can't see the expression of this new interface in the mobile context because so much of what makes it powerful is the instant availability system-wide, with deeper and broader access to user data and apps.

All those impressive Apple Intelligence flows that were demonstrated at Apple’s WWDC 2024, that haven’t shipped? You can do them and more on the desktop, now.

The emergence of the Hypermodal Interface is a fresh development, and implementations are still a bit clunky for non-technical users, especially when installing or enabling certain MCP tools. But developers are quickly simplifying the process into an App Store-like installation experience (nobody is charging for MCP tools). And these apps are unmistakably showing us the next user interface modality: shared context, complex task delegation via the issuance of short commands, integration with everything.

A Note on Security

I don’t want to wrap this piece without a cautionary note on security.

These apps, by nature of their integration into your apps and services, require both a high level of trust of the app developers themselves, and open up a new attack surface into your computing environment.

Another important consideration: EnConvo and 5ire are developed by China-based teams, and Highlight and Sage are developed by United States based teams. Factor that into your software selection as you see fit.

The issue of the security of these apps is so important and pressing that I may dedicate an issue of the newsletter to this topic in the coming weeks.

Final Takeaways, today

Mini Reviews
Not all the apps I’ve mentioned have implemented all the components of the Hypermodal Interface, but they’re all headed in that direction. Some tradeoffs today:
- Best lightweight quick trigger UX: HighlightAI
- Most features: EnConvo
- Use your own LLM Provider keys: Sage, 5ire, and EnConvo
- Open source: 5ire
- Free: 5ire, Sage
- Not free but decently useable free mode: HighlightAI
- Prettiest: HighlightAI

There are rough edges
If you’re non-technical, give it a couple months before trying the MCP aspects of these apps. There are still some pain points.
MCP is hardly mentioned on the websites of these apps…
The support for MCP tools is the thing that makes the Hypermodal Interface powerful, but it’s so new that the apps have just integrated it. You may not see it called out much in their marketing materials yet, but that doesn’t matter. MCP tools are essential elements and will become leading features, although perhaps referred to as extensions or plugins.
Security
Keep the security issue in mind. In practical terms: 1. Select an app developer that you trust, and 2. Select MCP plugins that are shipped by companies or teams you trust.
“I can’t find anything on Hypermodal Interfaces”
Because I just coined the term.

Need expert AI guidance?
Book a 1:1 strategy call with me

Have feedback? Are there topics you’d like to see covered?
Reach out at: jeff @ roadtoartificia.com