Yoshimi
“When its circuits duplicate emotions”
For a long while now a certain idea has echoed around my velvety in-skull. Specifically, I’ve been thinking about an experimental operating system. When I first started thinking about this I named it Yoshimi – I was listening to a lot of The Flaming Lips1 – and for lack of a better name I’ll call it that here. I believe it would be nicer to use than other modern operating systems, free as it is from compatibility and convention, though I have absolutely no way to back that claim.
For context, I am a low-level programmer swimming toward the pastel-coloured cliffs of usability design. As such I’ll start with a high-level description of each component and then move downward into the implementation. If anything gets too deep for you just skip a little! ☺ Also don’t expect this to realistically be built, unless you can give me a briefcase filled with unmarked bills.
Glowing little rectangles, all alike
Why is it that even with our data in the ‘cloud’,2 when we open an application on our computer it isn’t available on our phone too? We may be able to open a new browser tab, enter the same URL, then sign in again, but there’s some sense of déjà vu – and there’s evidence that native applications are preferable anyway... Why the distinction between devices? If I want some information it shouldn’t matter which glowing rectangle I’m peering at right this moment, it ought to be available.
Distributed operating systems are nothing new. For example, Plan 9 from Bell Labs, the research successor to Unix from the 80s, would display a graphical interface on one computer, compute on another, and store data on yet another. All that is required is a common core, a microkernel, on each machine and the operating system can organise itself across the network.
But while Plan 9’s networks were largely static Yoshimi would have to be more protean: it would organise itself dynamically according to available resources. For example, if it were running on a phone and you began a work-intensive process it might contact your desktop, or spin up a virtual server, for help. The system would constantly monitor and optimise itself for its environment. Hopefully there would be an AI element: it could prioritise according to your usage patterns.
Everything you do on each device should therefore synchronise between all devices. When you open a document on your desktop you ought to be able to access it on your phone with a tap. With the Internet as pervasive as it is, and with Bluetooth available as a proximate backup, I see no reason why a machine’s storage ought to be anything more than non-volatile cache.
Monadic fox jumps over the lazy dog
A big problem with mobile devices is that their lack of power tends to make the interface slow and unresponsive. Apple actually recommend iPhone developers display a static image of the app while it is loading so it seems ‘snappier’. Genius. Except this means you have no idea how long you have to wait until the interface will actually respond, which can take a good few seconds, and there is often no indication that it has loaded, making the wait seem longer.
The solution is to ensure that the application only loads what is actually necessary to display each screen. The reason this is never done is it would be extremely tedious to implement in a language like Objective-C, having to work out piece by piece what is necessary to render the interface. In a language like Haskell,3 however, this is implicit: only what is required by what is visible of the interface is ever executed, due to lazy evaluation.
Since Haskell is purely functional a compiler can also analyse and automatically parallelise a program, meaning Yoshimi could prioritise threads based on the resources with which they interact via IO monads. For instance, audio threads would be given higher priority than graphical threads, which would in turn be higher than networking threads, and so on. This would make interactive elements as responsive as possible, rather than lagging due to background computation.
This applies to a thread’s dependencies as well: viewing a folder would only require the information for its first few items since the others cannot yet be seen. The others would therefore not be loaded, or would be a lower (speculative) priority, since the graphical thread does not depend on them. The logical consequence of this is that if you were to mute your sound all audio processing would stop, and when continued would just take into account how long had passed since.
Forever and always
We no longer have a need for the venerable Unix hierarchical filesystem, as we are able to search a vast amount of information pretty much instantly, and our mode of doing so is (or ought to be)4 graphical. Filenames are arbitrary and seldom put to good use – after all, “the content of a text file is its own best name.” So while tagging can be useful I don’t believe we need to restrict ourselves to this design any longer.
Instead of using a filesystem Yoshimi would feature a single system-wide object space. Objects within this space would be orthogonally persistent: they would be frequently saved to disk (and distributed across the network), unlike standard virtual machines which cease to exist once the process has ended. Downloading a file would therefore create a new object, which could be indexed for searching and hence treated as though it were a file in any other filesystem.
To the extent that available storage allows all data changes should also persist. This means keeping deltas of each change, essentially creating ‘infinite undo’ and allowing you to ‘time travel’ to see the system as it was last Tuesday, or whenever.5 You could even ‘undo’ a window closure. Though this sounds like it would create and store a lot of garbage data we can rely on assumptions about functional purity to keep this from happening.
With the exception of IO a purely functional language is entirely deterministic, which means all other computation can be repeated at any time and reach the same state. Therefore only IO must be logged to guarantee persistence; anything else is just computational cache. After a sudden loss of power, say, the system could use the logged IO to return to the state the instant before shutdown without having to thrash drives writing predeterminable data.
The polymorphic panorama
One shortcoming of applications in other operating systems is their lack of composability. When interacting with a file we must open a dedicated application and work confined solely to that environment; we must use one program for viewing an image, another for editing a bitmap image, and yet another for editing a vector image – each with its own individual set of tools. This means the operating system cannot be a very cohesive environment.
Yoshimi would use an alternative paradigm whereby all ‘files’ (or equivalent) would each have a certain visual representation used to render it on-screen. A PNG, for instance, would represent itself as the image it encodes. There would be no image viewer; rather, any window’s sole purpose would be to decorate the borders of such an object’s visual representation. However, the same representations would be capable of being nested within others, as in compound documents common to word processors.
A tool can by extension be considered a function which deals with these objects. You may choose to horizontally flip an image, which would be a function of type (Image a ⇒ a → a), which would take an image and return the image having been flipped. Likewise, a paintbrush tool would be of type (Image a ⇒ a → Path → a), requiring in addition the path of the brushstroke, given by the user interactively. The representation of the returned object would then replace the original in its context, be it a lone window or compound document.
There would be two possible ways of calling these functions. The first would be a menu bar at the top of the screen, much like that of the Macintosh, which would automatically list the tools appropriate to the objects you are currently dealing with. The second would be a mechanism, likely a keyboard shortcut, which would provide you with a place to type commands, much like Mozilla’s Ubiquity, allowing you to compose tools as one would functions.
“But you won’t let those robots eat me”
True, creating an operating system like Yoshimi would take a lot of effort, but I believe it would be worth it. Existing software tends to be user-hostile: it’s confusing, modal, and if you make a mistake you’re toast. You only have to watch someone use a computer to notice the obstacles one faces when doing everyday tasks. But there’s an alternative! Some of these things have even been done before, like orthogonal persistence in Smalltalk environments and Symbolics Lisp machines. They just need to be brought under a single unifying vision.
- The Flaming Lips’ tenth album is called Yoshimi Battles the Pink Robots. While Yoshimi is human and the subtitle quote is actually from the track One More Robot, well, it’s a nice name.
- As it stands, our ‘cloud’ is not a cloud; we just reinvented the client–server model and gave it a flashy name. I’d like encrypted, distributed storage, preferably peer-to-peer.
- I specify Haskell because it is the most well-known lazy pure functional language. A dialect of Lisp or ML with similar properties would also work. The virtual machine ought to use a language-agnostic lambda bytecode.
- A task being faster in a terminal says more about the failures of a GUI than the successes of the command line.
- Over time, as storage begins to fill up, deltas would give way to less frequent hourly or daily snapshots. This is acceptable, as one is unlikely to require such granular versioning months later.
A lot of people have contributed to my thought process. Alphabetically,
Apple, Be Inc, Bell Labs, Stanislav Datskovskiy, The Flaming Lips, GNU, Google, Simon Peyton Jones, Jaron Lanier, Mozilla, Jef Raskin, PARC, Phil Smith, and Neal Stephenson.