A model OS

December 11, 2023

One recent trend I have seen in the world of large language model hype is to predict or imagine the models becoming a sort of operating system. A layer we could all use to interact with our computing life and get things done using a single interface and a lot more natural way of interaction. Who even needs apps anymore?!

This text was in particular triggered by Federico Viticci's musings in episode 479 of Connected, but I have seen similar lines of thinking in other places too.

I can see where the excitement comes from, and I kind of admire it. I very much enjoy re-thinkings of our computer interfaces, and I think way too little is happening in the area of operating systems.

But this idea, like so many others, has been around in various forms before, and I do not see why this particular interface on top would change all that much about it.

Essentially, I think the core problem is that a single interface for everything would not be better for almost anyone. Perfectly executed, it could be good for power-users who either knew all the options available, or who had the curiosity to keep exploring what was possible. But for everyone else, we would be navigating a labyrinth each time, trying to piece together and describe what we want done without any specific interfaces to help us. There is a reason we have specific apps with specific perspectives to help us with specific things: a dedicated app doing one thing in a way we like makes things easier. A unified natural language interface would be hiding all that, giving us all the work of figuring out what and how we could do things. All the positive hand-holding a well thought-out app can provide would be buried. And user interfaces still very much exist even if they are hidden behind a chat-like facade.

Now, it could certainly be incredible to have a natural language interface in front of something like Apple's automation system Shortcuts, making it more free-form to provide input to automation.

But, structuring and processing input is only one part of the problem. It is still one gnarly problem to actually make processing and using input and output easy and discoverable. The problem with programming - that is, instructing computers how you want things done - is not that programming languages are insufficiently like natural language. The problem is figuring out what you actually want and describing that with enough precision that the right thing actually happens when the computer performs actions in response.

Then, again, the options you have need to be made discoverable, and by every single app developer too.

If this was a great and attractive goal for many people, we would already have much of the benefit even before language models entered the picture. But we do not have all that, and putting a language model on top - while cool - will not automatically make the other pieces easier or faster to build as far as I can tell.

A very concrete, personal example: I feel that I should add Shortcuts and other automation support to my app Podcast Chapters, but it is non-trivial to figure out what and how the functionality should be exposed, and user demand has been virtually zero. And so, it has not happened, even though I would very much like to be the kind of person who builds scriptable apps.

Plus, you know, many large companies have some kind of fixation on letting people know which application they are using. Got to build those brands.

My bet: It will show up in Macos as one of Apple's ideas with great potential but little actual result and improvement. You will be able to interact with the system through a language model (and by extension, through voice and other exciting ways). The API:s will be limited but well thought out. A few developers will release really good and thoughtful support, and some people will use it for great things while lamenting that limitation X exists, or that more developers are not supporting the system. And the average user will either not use it at all, or discover one or two things they use consistently but which could easily have worked at least as well without the model inbetween.