Smart Dreaming: smartphone industry commentary: April 2007

Monday, 23 April 2007

A bit more on the CLI

David Beers engages with my comments on his blog, Software Everywhere.

He's right when he says I'm thinking of a traditional CLI, in terms of the syntax, if not in terms of how it interacts with its environment and arguments. And there's a reason for that -- the traditional CLI syntax combines expressiveness, terseness, regularity, and a strong mnemonic in a very powerful way.

David's proposal is like a traditional CLI, but with a dash of natural language, extensive autocompletion and a reduction in expressiveness to achieve terseness. (The autocompletion is actually so extensive that it is really a major part of the UI, and strains the definition of CLI -- perhaps it should be called a CCLI, for Completing Command Line Interface.) By dismissing scripting, for example, the expressiveness of a UI is substantially reduced. There should be concomitant advantages, and in David's model there are (terseness and interactivity).

However, I'm suspicious of how effective his system would be, given the level of terseness he claims, at expressing many of the things that I want to do with my smartphone -- even on a regular basis. (For example, how can "Hang up, Add caller, Record call, Write note, etc., all [be] accessible with a single keypress if you only have one hand free and don't want to tap these options on the touchscreen" when you are in the middle of writing a note? How does it know that the character is intended for a command rather than entry into the note -- doesn't that another keypress? And what if I wanted to call the number under the cursor, rather than one out of my contacts -- how do I differentiate?)

Anyway, to illustrate what I'm talking about with the tradeoff of terseness, expressiveness and other factors, let's think about why CLI's aren't simply natural languages.

Natural Language vs Traditional CLI

Natural language delivers expressiveness at substantially greater levels than the traditional CLI, but at the cost of terseness and (more importantly) regularity. (Obviously it doesn't even need a mnemonic.)

The reason that CLIs have never attempted to be like natural languages (NLs) are simply due to the lack of regularity of NLs. Software thrives on regularity and chokes on irregularity. Now software has certainly improved, but has it improved that much? I don't think so.

Even if software had improved that much, NLs bring other problems when used in a CLI.

In order to get the no-mnemonic-needed benefit of NLs, you have to support the end-user's own NL. That includes radically different grammars. For example, in English, imperative commands, which are generally what you would issue to computers, can start with a verb, followed by the object (the subject is implicitly the computer). In Japanese, on the other hand (the only other language I can speak), verbs are always at the end, even in imperative commands (eg. "Eat your rice" is "Gohan o tabete", where gohan is rice and tabete is the imperative form of eat). Let's just ignore the different character sets (in Japanese both gohan and tabete would be written with kanji -- Chinese characters -- and hiragana) which would add a whole other layer on top of this interpretive framework. Or rather, let's not!

Another problem is the lack of terseness that natural languages have, due to their generality. The o in the Japanese sentence above, for example, marks gohan as the object of the verb. But in a traditional CLI this is not necessary. If you want to see a hint of what this is like, take a look at a COBOL program. COBOL was designed with similar goals: to be easy for end-users to write software (or instruct computers) with little training. Of course, we all know that COBOL was not easy to use, and just ended up irritating programmers with its worthless verbosity for decades. (OK, I'm speaking as a C, now C++, programmer here -- maybe COBOL's verbosity didn't annoy everyone.)

These are all still problems for natural language input. Certainly extensive auto-completion reduces the impact of verbosity, but it doesn't remove it. The main trade-off with auto-completion is between verbosity and mnemonic value. The other trade-off is between regularity and expressiveness. There is leakage between these tradeoffs, so they're not clear-cut (eg. prepositions, articles, etc. all add to expressiveness as well as having mnemonic power, and they trade off verbosity and, in many languages, regularity).

Basically, moving away from a non-traditional CLI brings an, at least, 4D trade-off space that needs to be navigated. Throw in the fact that there are multiple NLs with radically different characteristics, probably forcing different trade-offs, and the fact that not all languages are as easy to input as English, and it becomes, to put it mildly, rather challenging.

To top it off, scripting is one of the major benefits of a CLI, and a benefit even to an ordinary end-user. It's not enough to dismiss scripting as too advanced for an end-user, since people use it all the time in their personal interactions with other people. The challenge is to make it easy enough for an end-user to engage in it. While using an NL as a CLI helps that, the regularity issue would be almost crippling for any software attempting to implement NL scripts.

Alternatives to the CLI

What are the equivalent challenges for my proposed five-way hierarchical menuing system? Clearly character sets are not a challenge (Unicode has made it easy to display any character sets, and display is all that's required). Neither are an NL's grammatical peculiarities, since any syntax would be simple, regular, and independent of NL. Menuing doesn't feed well into scripting, so that's a weakness of this system (it's not incompatible with scripts, it just doesn't naturally support them in the same way that a CLI does). Expressiveness is limited, but chiefly by the choice of grammar (for example, Apple's menuing system is very simple: object-verb, implemented via selection followed by menu command choice). Regularity is a complete non-issue, of course. So that leaves mnemonic value, and this is where the trickiness lies in this solution.

In order to have strong mnemonic value, a hierarchical menu has to be structured in a logical, sensible fashion that is going to reflect the end-users own understanding and thought structures. And it has to do it out of the box (nobody will spend ages configuring software -- Apple learned that the hard way with Newton's handwriting recognition). To make matters difficult, the hierarchy has to be universal, ie. cover the full range of commands that can be issued at any time (otherwise it violates one of the purposes of David's UI -- to be able to perform any action on a piece of data). This is a matter for considerable research.

Just a quick note to relate this all back to the status quo: the traditional GUI, as implemented on all smartphones, uses contextual menus and selection mechanisms to achieve a basic object-verb level of expressiveness. The expressiveness is severely limited by the application "silos", though. Unfortunately, opening the range of commands up would overwhelm the command activation method (flatish menus). The greater expressiveness of GUIs (where needed) is achieved by dialogs. These allow very complex interactions (such as the logic-based search rules of DreamConnect), with great mnemonics, but hopeless regularity and verbosity and no chance of scripting.

Conclusions

Clearly, near-NL CLIs are not really viable, even on PCs, despite the level of expressiveness that they would deliver. Hierarchy menus have their own issues, but certainly show potential for mobile devices, I think. CCLIs have potential, especially if the command language was as expressive as traditional command lines, but I'm skeptical about the lack of expressiveness of David's proposal (either that or whether it can achieve David's claims of terseness given a reasonable level of expressiveness). The issue really is, for mobile devices, terseness is crucial, and needs to be achieved without sacrificing too much expressiveness.

Still, the proof is in the pudding. I'd like to see any of these systems running. Any would be better than the status quo, I reckon.

Saturday, 14 April 2007

The CLI is cool again!

It seems that the CLI is cool again. David Beers has an excellent post on the CLI on a mobile here. Inspired by this I decided to try out Enso. Unfortunately, despite a truly great demo video, Enso was a big disappointment -- it simply didn't support what I use a computer for (I'm a programmer, but I also do writing, photo manipulation, and video editing, amongst other things). Enso failed for me because it didn't reach into the data that I manipulate with it's CLI, making the CLI almost useless. (It didn't even autocomplete filenames for me -- I have to manually map files/directories into Enso's namespace!)

I come from a CLI background (UNIX), and still use vim as my main editor (vim is a truly modal editor, with functionality like search and replace supported via command line). So you could say I'm favourably predisposed towards command lines. But let's try to analyse the benefits and disadvantages of command lines and GUIs (Graphical User Interfaces).

CLI vs. GUI

(Advantages & Disadvantages)

CLI Advantages	CLI Disadvantages	GUI Advantages	GUI Disadvantages
Direct access to commands	commands not displayed (completion helps)	indirect access to commands	commands displayed (in menu)
random access to objects like files easy (esp. with completion)	difficult to define graphical/text selection	easy to define graphical/text selection	clumsy random access (since objects are displayed spatially, it is hard to keep a wide range of them visible)
easy to manipulate one or more database-style objects with textual search/replace style commands	difficult to directly manipulate graphical objects/text (NB: graphical objects can include objects that are simply represented graphically, such as calendar events)	easy to directly manipulate graphical objects	difficult to manipulate more than one database-style objects
easy to implement history and/or repetitive operations (eg. scripts/macros)	feedback from operation limited/implicit	difficult to implement history and/or repetitive operations	feedback from operation usually explicit

Proposal

CLI's and GUI's clearly have different strengths and weaknesses. In summary, CLI's are better at issuing commands and dealing with database-style records or files and selection on textual-based searches; GUI's are better at direct manipulation of objects that can be represented graphically, and arbitrary but contiguous selections.

Using the two forms of UI in combination offers significant benefits:

remove indirectness of menu access
allow sophisticated sorting/searching and replacing, even in combo with GUI's graphical representations (to increase contiguity of selection)
allow history/redo/macro capabilities.

Implementation

So how do we implement this? For a PC, with it's large keyboard, screen, and pointer, I propose the following solution. Get rid of the menu bar (on the top of windows in Windows, and at the top of the screen on the Mac) and replace it with a command bar at the top of the screen. This line will show the command line as it's entered, and drops down a translucent list showing any autocompletion options or history.

The command line should be accessible with simple key toggle (like Enso, but probably modal, since holding down a command key while typing limits what you can type). But the CLI should also interact with the GUI's elements, for example, highlighting items which may match (i.e. are in the autocompletion list) with a special "tentative" highlight.

Command scripts should be able to span app's. So, for example, to do a search and replace in multiple documents, you might be able to simply type:


for file in C:\Documents\DS*.doc
open file in Word
Word::Replace "Series 60" "S60"
Word::SaveClose
end

This should fill in the files with autocompletion during the first line, and then execute it at completion.

Mobile Solution?

The problem with this scenario is pretty obvious: it is heavily keyboard relient. Without a full alphabetic keyboard the commands suddenly become more indirect again (with some form of input method intervening). And the long lists generated by autocompletion aren't very friendly on a small screen. Furthermore, keyboard styles vary so much from device to device (and even within the same device in an increasing number of devices, inspired by Sony Ericsson's P-Series phones) that it would be difficult for a user to become fluent in this style of interaction.

My preference is actually for a tree-structured command space, navigated using the keypad or stylus/finger. See DreamScribe for an initial implementation of such an idea. The benefit of this type of UI for mobile phones is manifold:

It is one-handed with one-handed phones
It works identically in both keypad and stylus-driven UI's, and doesn't require any additional hardware
Commands are visible
Commands are grouped by topic (like menus, but in an even more sophisticated way)
"Muscle memory" can be used for commands, with careful design (so the same command is always in the same place in the hierarchy)
Commands can be used from anywhere, unlike menus which need to be contextual in order to maintain their navigability (since they have a very flat hierarchy)
Commands can either be combined with the GUI (basically replacing menu commands) or feed into a CLI (with textually specified arguments)
Arguments can also be mapped into the hierarchical system, so that rather than long, linear lists of autocompletion options, hierarchies of possible arguments can be presented

There is a lot of potential in such a UI. Keystick (now "morphing" into Kanuu) showed that this was possible even with text input, and it looks like they're extending it to all sorts of navigation, just as I've suggested above. DreamScribe mapped calendar and contact attributes into a hierarchy, and the sky's the limit, really. See also Ring-writer as an innovative approach in this area.

Conclusion

So, while I think the combo CLI/GUI I suggest above would be great for PCs, I don't think it has much future in the mobile space. I really think the five-way hierarchy provided by joypads is a much better solution for mobiles.

Tuesday, 3 April 2007

Carnival of the Mobilists #66 and #67 and responses

This is a bit late, but Carnival of the Mobilists #66 is at All About Symbian, one of my major Symbian news sources. My post on Contextuality is in that carnival.

Also, Carnival #67 is up at Wap Review and it has an interesting post from David Beers which, in the latter half, bounces off my idea of contextuality, and extends it out into the interrelationship between applications and data.

To be honest, I really wasn't thinking along these lines, for two reasons: 1) I can write applications, but I'm not in a position to write OS's and 2) I've seen too many failures of frameworks that have tried to achieve this.

Regarding reason 2: I love the idea of the user being able to use any tool he owns on his current context. However the Newton and Pink (or Taligent), both showed how difficult this is to do in reality (the Newton got further, but only because it was less ambitious). Apple aren't alone in trying this, MS have given up on their DB-based filesystem, which was trying to do a similar thing. In fact, MS have been talking about the idea for well over a decade. The most successful attempt at this approach that I've personally seen was the Oberon project, which actually allowed any text to be treated as a command. Brilliant stuff, but quite limited in the real world.

I've had so many hopes for this type of capability dashed: OLE, OpenDoc, Novell's software bus, the Newton's data soup, PenPoint's object oriented integration, Symbian's DNL (Dynamic Navigation Links, which do actually work, but are missing the key functionality of "vectorability" -- maybe more on this later), etc. etc.

It's made me very cynical about this. But I still have hope. Maybe one day we'll all get things sorted enough that software will start getting out of the way and actually helping people do stuff.

(Oh yes, regarding reason 1, maybe it's worth thinking about how to create this sort of open environment hosted by an application framework, rather than natively via the OS... Hmm...)

Smart Dreaming: smartphone industry commentary