Wednesday, 30 May 2007

Smart Phone or Mobile Browser - Part II

In my first post on this topic, I talked about the history of web-based applications, and also quickly took a look at Japan, the land of mobile browsers (as opposed to smartphones).

In this post, I'll dig into specific issues with applications in general, and web-based apps in particular. So let's get stuck in.

In order for applications to be enjoyable to use, there are a number of factors that must meet certain strict requirements. Putting aside prettiness (which is a factor, but is less critical than these three), we all require the following from applications: responsiveness, reliability, and privacy (this last one is increasingly an issue in a thoroughly networked world). Let's look at each in turn.

Responsiveness

Responsiveness is a critical usability feature in any modern, GUI-based application. It is critical in GUI applications because we interact at such a fine-grained level with the application, performing little operations one at a time, such as typing a single character, selecting a single item, or choosing a single command.

Responsiveness is even more important in mobile applications, because we are operating these applications in high-demand situations. Situations where we need to achieve our goal in a strictly limited amount of time. Poor responsiveness will obstruct us from achieving our goal, and will force us into using alternative mechanisms (for example, something other than our phone) to achieve our goals.

There are two forms of responsiveness that are relevant to this discussion:

  • Interaction responsiveness

    This is the speed at which an application responds to our interactions, our typing a character or selecting an individual item, or issuing an individual command.

    AJAX, Flash, and other technologies have vastly improved this type of responsiveness in Web 2.0 apps. There are still restrictions, but this type of responsiveness is rarely a problem any more. It has never been a problem for client-based software, except when the hardware was simply too underpowered, or an inappropriate technology (like Java) was used.
  • Invocation responsiveness

    This is the speed with which an application instance can be invoked. In other words, how long does it take to bring up the application in order to do some work in it?

    This is an area that Web 2.0 applications are poor at. Web 2.0 apps need to be downloaded into the browser at every invocation. Browsers are heavy pieces of client software, and tend not to be left pre-loaded in phones, so the invocation of the browser is another issue that tells against Web 2.0 apps. Finally, few phones give equal priority to bookmarks as to installed application icons, so reaching a Web 2.0 app requires a multistep process -- start the browser, open the bookmark list, choose a bookmark, log in. This is an easy problem to solve though, and is a good idea for a nice, simple utility.

    An example of poor invocation responsiveness is the way people will often use a physical phone directory in preference to a web based one, because turning their computer on, starting it up, starting up the web browser, and going to the online directory takes far longer than picking up the phone book. The fact that searching for names can be much faster online than in the book (an example of interaction responsiveness) is often irrelevant.

    Interestingly, mobile phones and PDAs in general make great efforts to improve invocation responsiveness, being designed to be constantly turned on, carryable, etc., so any reduction in invocation responsiveness really cuts against the grain.
So, Web 2.0 apps are now responsive in interaction, but still poor in invocation. The invocation issue is exacerbated in a mobile environment because of the unreliability of the mobile data connection. Which leads into the next topic.

Reliability

The reliability of any application can be expressed via a simple equation.

Let's take reliability as a number between 0 (never works) and 1.0 (works 100% of the time), ie. a probability of something working.

Let's then write the reliability of a component as R(component).

So:

R(system) = R(component1) * R(component2) ...

In other words, the reliability of a system is the product of the independent reliabilities of its components. (This requires the reliabilities to be independent -- if they are dependant the easiest way to handle this is to collapse them into one measurement.)

So, the reliability of client-based software is:

R(client software) = R(client app) * R(client OS) * R(client HW)

Or, in English, the reliability of client-based software is the product of the reliabilities of the client application software itself, the client operating system it's running on, and the client hardware.

As a specific example, the reliability of DreamConnect 3 (a UIQ 3 contacts manager) is:

R(DC 3 app) = R(DC3) * R(UIQ3) * R(phone)

Until recently R(UIQ3) was pretty poor, so R(DC 3 app) overall was unsatisfactory. However, now all reliabilities are up, and R(DC 3 app) is at a level where a user can be happy. From the user's perspective, it is tempting to think only the final reliability matters, but users are more sophisticated than that. They can deduce that R(UIQ 3) was poor, for example, and that will encourage them to move to a different platform. Users can even differentiate between R(OS) and R(HW) if they have enough experience. As ISVs, we have direct control only over R(app), but we do have indirect control over R(OS) and R(HW) by deciding what platform and hardware to support. It is worth bearing this in mind.

So, how about Web 2.0 apps? What does their reliability equation look like?

R(web app) = R(AJAX app) * R(browser) * R(client OS) * R(client HW) * R(network) * R(server app) * R(web server) * R(server OS) * R(server HW)

As you can see, there are lot more components involved. Let's quickly work through them:
  • AJAX app: This is the component of the Web 2.0 app that runs inside the browser on the client machine. It may use some technology other than AJAX, but I'm just using that as a convenient label.
  • Browser: The browser is an important component of this type of solution -- it has the misfortune of being used as a development environment but needing to meet the expectations of a user application.
  • Client OS and Client HW: Same as for normal client software, except the reliability of the local storage has less of an impact on this scenario, since it is used only for invoking the client OS and browser, and not for storing the application data.
  • Network: The reliability of the network is a critical part of the functionality of a Web 2.0 app. The app is invoked across the network, various amounts of functionality are implemented in the server, and all data is stored on the server. The network reliability is thus fairly critical.
  • Server app: This is the component of the application that runs on the server side -- it often involves database code, (and the underlying database software, which is usually very reliable), etc.
  • Web server: The web server software itself, which is important to the function of a Web 2.0 app. Web servers are generally very reliable pieces of software.
  • Server OS and HW: The server's operating system and hardware, which is generally very reliable, more so than client equivalents.

So, what's the end result? Well, as mentioned above, R(server) component (R(web server) * R(server OS) * R(server HW)) is very reliable, and we could probably approximate it as approaching 1.0, and so remove it from the equation. This still leaves:

R(web app) = R(AJAX app) * R(browser) * R(client OS) * R(client HW) * R(network) * R(server app)

We can further simplify this by saying that R(app SW) = R(AJAX app) * R(server app) and we could assume that, since this is under the control of the developer, it's likely to equal R(client app). So:

R(web app) = R(app SW) * R(browser) * R(client OS) * R(client HW) * R(network)

Now we can see that Web 2.0 software is less reliable than client software by the following amount:

R(web app unreliability factor) = R(browser) * R(network)

In the past, R(browser) has been very poor, and has dramatically impacted the reliability of any Web 2.0 software. I would argue that R(browser) is still a significant issue, and counts heavily against web software, including on a PC. Of course, the impact of R(browser) is less than, say, a storage failure, so long as the software is designed properly (ie. regular saving of information -- Google has just recently recognised this by putting a very regular auto-save feature into the Web 2.0 app I'm using to write this).

On the other hand, R(network) varies widely between desktop-bound PCs and mobile smartphones. R(network) nowadays is usually quite high for fixed networks, but for mobile networks it is still quite low, especially for 3G and when moving. For example, I only need to travel ten minutes west to drop out of 3G coverage into 2G, and few minutes further (into the mountains) to drop out of 2G coverage as well. If I were using a Web 2.0 mapping/routing application (such as Google maps), it would fail me almost as soon as I left the city heading west.

In conclusion, then, R(network) is an absolute killer for Web 2.0 style apps on the mobile. Michael Mace observed this in a less formal way a while back.

Privacy

Privacy is an issue that hasn't really surfaced yet. Since I have a background in security (working on secure OS's as well as Internet security), it's one that I'm keenly aware of.

At present, Web 2.0 apps are either about sharing information, which reduces the privacy concerns, or simply make promises about privacy. There are limited technological systems in place to ensure the privacy of your data.

I have a family blog. It is restricted to only the people that I invite, namely my family. Because I've restricted it to just these people, I can feel free to write about my family's private life. Or can I? What assurance do I have that programmers at Google aren't poring over the (rather boring and very unsordid) details of my life? What assurance do I have that Google won't suddenly copy my private ponderings to the top of every search result they return to their millions of users? Well, I have Google's word. That's all.

Does Google's word protect me from a mistake in their software? No. Does Google's word protect me from a malicious programmer within (or even outside) Google? No.

Imagine this: it is 2017, MS has collapsed under its own weight. Google rules supreme. For some reason, you want to bring a suit against Google, and you are preparing your legal arguments. Using Google Office's word processor. Which saves the text on Googles servers. Encrypted by software that runs on Google's servers. How easy is it for Google to capture your password (they don't need to change the software on your machine -- it's uploaded to your machine every time you open it, so they just change it on their server, which they have every right to do and you can't prevent them doing) and to decrypt and pore over your arguments? Google may desire to do no evil, but how can we trust them to keep their word?

In contrast, client-based software allows firewalling, packet sniffing, and so on, to ensure absolute security.

But the current situation is much worse than that. I'm not even aware of any Web 2.0 apps that provide encryption. Let alone anonymization (so that the app provider can't snoop on your behavior). But both of these, in combination as often as possible, are crucial privacy protections. We're so used to relying on the inaccessibility (except by direct physical access) of our storage, but that's not a part of the Web 2.0 world.

How does encryption work for Web 2.0? Well, it only works when a) you don't want to share the data publicly, and b) you don't want the server to process the data (channel encryption, such as SSL, can still be used, though). So any document, calendar, or database should be encrypted, with the decryption key known only to you. If you wish to share pieces of this, those pieces should be stored separately, with no links back to the encrypted data (which would unnecessarily violate the security of your main data store). Why, then, aren't Google calendars encrypted? Or Google mail messages, etc.? Well, because no-one cares about privacy. Yet.

And what about anonymization? This is a technology that's useful when your data needs to be processed by a server, but doesn't need to be associated with you in particular. For example, a search query doesn't need to be associated with you (unless you want it to be, in order to benefit from the search engine's knowledge of your interests), neither does a location-based services request. Does Google search or Google maps use anonymization? No, because people aren't asking for it and it has a cost associated with it (you need a third party -- the anonymizer -- and it doubles the traffic in the cloud).

While both of these technologies have costs (encryption increases processor load and slightly increases network load), their benefits will eventually become so clearly important that we will see them implemented. I don't have time to talk about all the concerns here, but Bruce Schneier's blog is a great source for this sort of thing. Unfortunately, Web 2.0 apps are difficult to validate (because they're so easy to modify and can even present different versions to different users), so this is another stroke against Web 2.0 apps.

Conclusion

Phew! This has been a long trek. But at the end we can see that, while Web 2.0 apps make sense for some situations on desktop PCs, they have significant disadvantages for mobile usage.

In my next post, I'll talk about a third way, namely the Web Services model, in which client applications use web services to deliver a powerful solution. This is something that smartphones can excel at.

Addendum

Google has released Google Gears which is an attempt to reduce the impact of R(network) described above. Google Gears allows Web 2.0 apps to work with a client-side cache while disconnected from the network, and to synchronise the local cache with the server when the network is available.

This is a great piece of technology, if it works, since it massively reduces the impact of R(network) just as connected clients (to be discussed in my next post) do. Basically, it transforms Web 2.0 apps into connected clients. Web 2.0 apps still have the disadvantage of R(browser), of course (not to mention the memory and performance impacts of the browser and associated technologies), but this is a worthwhile improvement.

Tuesday, 29 May 2007

Carnival of the Mobilists #75

Well, it's another carnival, and I'm privileged to participate in this one.

As always, there are lots of good posts that give you a good feel for the directions in which the mobile industry is heading. Make sure you have a look around Andreas Constantinou's site, which is hosting the carnival this week. He has quite a number of thought-provoking posts (I must confess I don't agree with everything he says, but that doesn't mean it's not interesting or worth reading).

The carnival has a very "web services" sort of feel to it this week, as the mobile world gets more caught up in the Web 2.0 circus. I was interested to read that there is a difference between the Mobile Web 2.0 and Mobile 2.0. I'm still not really sure why anyone would use a term like Mobile 2.0, but I guess it conveys some of the meaning that it's a collision of the traditional telecoms world with the Internet world.

Having experience working with both (including projects with a sophisticated OSI networking stack), I remain cynical about how well Internet protocols can replace telecoms ones. But then, I use VoIP as my main outgoing telecoms for my home and work phones, so I guess it can do a half-decent job...

Anyway, check out the posts, and stay tuned for Smart Phone or Mobile Browser - Part II here in the next couple of days, where I'll discuss the Mobile Web 2.0 (and its alternatives) at more length.

Tuesday, 22 May 2007

Smart Phone or Mobile Browser

[This post has been languishing for a month. I think it's time to post it's beginning.]

It seems that some people are expecting smart phones to evolve into what is effectively just a mobile browser, providing a "thin client" for web-based applications (eg. AJAX apps like Google Mail, etc.).

This sounds familiar to me -- I've been through this before, in the PC industry. So is it different this time? Let's take a closer look.

History

First, some history.

In the mid-nineties, when the Web was taking off, a number of companies saw the opportunity to make a revolutionary type of personal computer to replace the existing model of the PC (running Windows 3.1 back when this started). This type of computer was called the NC, short for Network Computer. It's greatest proponents were the Acorn/Oracle partnership, and Sun. The idea was that the NC would be a mediumly thin client (thicker than a dumb terminal, such as an X-Windows terminal, but thinner than a PC). Basically the NC would run apps natively, but they would be downloaded from the network, and data would reside on the network.

This is a very similar model to the new AJAX style web-based applications.

Of course, history tells us that this approach was a massive flop, and it killed Acorn, amongst any number of other companies. Since the winning technology could hardly be considered to have won through techical superiority (we're talking about Windows 95 and NT 3 here), it must have been the underlying architectures that had significant advantages and disadvantages.

Interestingly, I was in favour of the NC at the time, mostly because I was a dedicated Acorn user, and didn't want to see the most innovative PC company go under. This time I find myself "naturally" on the other side.

So why did the NC fail? Well, the answer's two-fold, but pretty simple:


  1. Implementing sophisticated applications in Java (the language of choice for NC's) was probably as easy as doing so for Windows (back then), but the results were vastly inferior. NC's back then were running 30MHz ARM CPUs or the like, and Java was just too heavy for this (or PC) hardware. PC applications were simply more responsive than Java app's.
  2. The network simply wasn't up to handling all these apps and associated data. I remember working in Novell KK (Japan) in the latter half of '94 when Mosaic and the Web were gathering steam, and we still only had a 64kbps ISDN line shared between the 200 or so staff there!

The burning question is whether things have changed enough so that these major issues are no longer severe enough to simply kill the approach. (And, for mobile, I think the answer is pretty clearly, "no", see Michael Mace's post here for some insights.)

Japan

So, what about Japan? Clearly their phones are mobile browsers, and not smart phones. Perhaps they provide an example of the future.

Yes, well Japan has always marched to the beat of a different drum. In the nineties, when the NC was being hyped, the PC was still rare amongst home users in Japan. Much more common were turnkey wordprocessors (a type of technology almost completely dead in the West at that time).

To really understand Japan, you would have to understand what they're actually doing with their mobile browsers. I have to confess that I don't know what, exactly, they're doing. But I'm fairly sure it's merely consuming content and creating trivial content, rather than generating sophisticated content, or doing sophisticated processing that on-device app's are so good at.

In addition, you need to be aware of the geographical character of Japan and other target markets. Japan is heavily urbanised, with around a quarter of the population in greater Tokyo. The rest of the population are scattered through densely settled valleys and plains, with almost unpopulated mountains between them. This makes it very easy to build a mobile network that covers almost everyone. (Especially given the population and relative size of the country.)

Compare Japan to Australia, which, despite being heavily urbanised, has the population density such that travelling a few hundred kilometres from the population centres will dump you into a sparsely populated (but still populated) vast land, almost impossible to network with mobile networks.

So Japan can (as usual) be viewed as a special case, and not an indicator of things to come.

Next post I'll look further at issues with browser-based applications on mobile devices, including:

  • Responsiveness (a common issue, partly addressed by technologies like AJAX)
  • Reliability (a problem for mobile devices that no longer really exists for desktops)
  • Privacy (a problem that no-one seems to be taking seriously, on any platform)

Monday, 23 April 2007

A bit more on the CLI

David Beers engages with my comments on his blog, Software Everywhere.

He's right when he says I'm thinking of a traditional CLI, in terms of the syntax, if not in terms of how it interacts with its environment and arguments. And there's a reason for that -- the traditional CLI syntax combines expressiveness, terseness, regularity, and a strong mnemonic in a very powerful way.

David's proposal is like a traditional CLI, but with a dash of natural language, extensive autocompletion and a reduction in expressiveness to achieve terseness. (The autocompletion is actually so extensive that it is really a major part of the UI, and strains the definition of CLI -- perhaps it should be called a CCLI, for Completing Command Line Interface.) By dismissing scripting, for example, the expressiveness of a UI is substantially reduced. There should be concomitant advantages, and in David's model there are (terseness and interactivity).

However, I'm suspicious of how effective his system would be, given the level of terseness he claims, at expressing many of the things that I want to do with my smartphone -- even on a regular basis. (For example, how can "Hang up, Add caller, Record call, Write note, etc., all [be] accessible with a single keypress if you only have one hand free and don't want to tap these options on the touchscreen" when you are in the middle of writing a note? How does it know that the character is intended for a command rather than entry into the note -- doesn't that another keypress? And what if I wanted to call the number under the cursor, rather than one out of my contacts -- how do I differentiate?)

Anyway, to illustrate what I'm talking about with the tradeoff of terseness, expressiveness and other factors, let's think about why CLI's aren't simply natural languages.

Natural Language vs Traditional CLI

Natural language delivers expressiveness at substantially greater levels than the traditional CLI, but at the cost of terseness and (more importantly) regularity. (Obviously it doesn't even need a mnemonic.)

The reason that CLIs have never attempted to be like natural languages (NLs) are simply due to the lack of regularity of NLs. Software thrives on regularity and chokes on irregularity. Now software has certainly improved, but has it improved that much? I don't think so.

Even if software had improved that much, NLs bring other problems when used in a CLI.

In order to get the no-mnemonic-needed benefit of NLs, you have to support the end-user's own NL. That includes radically different grammars. For example, in English, imperative commands, which are generally what you would issue to computers, can start with a verb, followed by the object (the subject is implicitly the computer). In Japanese, on the other hand (the only other language I can speak), verbs are always at the end, even in imperative commands (eg. "Eat your rice" is "Gohan o tabete", where gohan is rice and tabete is the imperative form of eat). Let's just ignore the different character sets (in Japanese both gohan and tabete would be written with kanji -- Chinese characters -- and hiragana) which would add a whole other layer on top of this interpretive framework. Or rather, let's not!

Another problem is the lack of terseness that natural languages have, due to their generality. The o in the Japanese sentence above, for example, marks gohan as the object of the verb. But in a traditional CLI this is not necessary. If you want to see a hint of what this is like, take a look at a COBOL program. COBOL was designed with similar goals: to be easy for end-users to write software (or instruct computers) with little training. Of course, we all know that COBOL was not easy to use, and just ended up irritating programmers with its worthless verbosity for decades. (OK, I'm speaking as a C, now C++, programmer here -- maybe COBOL's verbosity didn't annoy everyone.)

These are all still problems for natural language input. Certainly extensive auto-completion reduces the impact of verbosity, but it doesn't remove it. The main trade-off with auto-completion is between verbosity and mnemonic value. The other trade-off is between regularity and expressiveness. There is leakage between these tradeoffs, so they're not clear-cut (eg. prepositions, articles, etc. all add to expressiveness as well as having mnemonic power, and they trade off verbosity and, in many languages, regularity).

Basically, moving away from a non-traditional CLI brings an, at least, 4D trade-off space that needs to be navigated. Throw in the fact that there are multiple NLs with radically different characteristics, probably forcing different trade-offs, and the fact that not all languages are as easy to input as English, and it becomes, to put it mildly, rather challenging.

To top it off, scripting is one of the major benefits of a CLI, and a benefit even to an ordinary end-user. It's not enough to dismiss scripting as too advanced for an end-user, since people use it all the time in their personal interactions with other people. The challenge is to make it easy enough for an end-user to engage in it. While using an NL as a CLI helps that, the regularity issue would be almost crippling for any software attempting to implement NL scripts.

Alternatives to the CLI

What are the equivalent challenges for my proposed five-way hierarchical menuing system? Clearly character sets are not a challenge (Unicode has made it easy to display any character sets, and display is all that's required). Neither are an NL's grammatical peculiarities, since any syntax would be simple, regular, and independent of NL. Menuing doesn't feed well into scripting, so that's a weakness of this system (it's not incompatible with scripts, it just doesn't naturally support them in the same way that a CLI does). Expressiveness is limited, but chiefly by the choice of grammar (for example, Apple's menuing system is very simple: object-verb, implemented via selection followed by menu command choice). Regularity is a complete non-issue, of course. So that leaves mnemonic value, and this is where the trickiness lies in this solution.

In order to have strong mnemonic value, a hierarchical menu has to be structured in a logical, sensible fashion that is going to reflect the end-users own understanding and thought structures. And it has to do it out of the box (nobody will spend ages configuring software -- Apple learned that the hard way with Newton's handwriting recognition). To make matters difficult, the hierarchy has to be universal, ie. cover the full range of commands that can be issued at any time (otherwise it violates one of the purposes of David's UI -- to be able to perform any action on a piece of data). This is a matter for considerable research.

Just a quick note to relate this all back to the status quo: the traditional GUI, as implemented on all smartphones, uses contextual menus and selection mechanisms to achieve a basic object-verb level of expressiveness. The expressiveness is severely limited by the application "silos", though. Unfortunately, opening the range of commands up would overwhelm the command activation method (flatish menus). The greater expressiveness of GUIs (where needed) is achieved by dialogs. These allow very complex interactions (such as the logic-based search rules of DreamConnect), with great mnemonics, but hopeless regularity and verbosity and no chance of scripting.

Conclusions

Clearly, near-NL CLIs are not really viable, even on PCs, despite the level of expressiveness that they would deliver. Hierarchy menus have their own issues, but certainly show potential for mobile devices, I think. CCLIs have potential, especially if the command language was as expressive as traditional command lines, but I'm skeptical about the lack of expressiveness of David's proposal (either that or whether it can achieve David's claims of terseness given a reasonable level of expressiveness). The issue really is, for mobile devices, terseness is crucial, and needs to be achieved without sacrificing too much expressiveness.

Still, the proof is in the pudding. I'd like to see any of these systems running. Any would be better than the status quo, I reckon.

Saturday, 14 April 2007

The CLI is cool again!

It seems that the CLI is cool again. David Beers has an excellent post on the CLI on a mobile here. Inspired by this I decided to try out Enso. Unfortunately, despite a truly great demo video, Enso was a big disappointment -- it simply didn't support what I use a computer for (I'm a programmer, but I also do writing, photo manipulation, and video editing, amongst other things). Enso failed for me because it didn't reach into the data that I manipulate with it's CLI, making the CLI almost useless. (It didn't even autocomplete filenames for me -- I have to manually map files/directories into Enso's namespace!)

I come from a CLI background (UNIX), and still use vim as my main editor (vim is a truly modal editor, with functionality like search and replace supported via command line). So you could say I'm favourably predisposed towards command lines. But let's try to analyse the benefits and disadvantages of command lines and GUIs (Graphical User Interfaces).

CLI vs. GUI

(Advantages & Disadvantages)

CLI AdvantagesCLI DisadvantagesGUI AdvantagesGUI Disadvantages
Direct access to commandscommands not displayed (completion helps)indirect access to commandscommands displayed (in menu)
random access to objects like files easy (esp. with completion)difficult to define graphical/text selectioneasy to define graphical/text selectionclumsy random access (since objects are displayed spatially, it is hard to keep a wide range of them visible)
easy to manipulate one or more database-style objects with textual search/replace style commandsdifficult to directly manipulate graphical objects/text (NB: graphical objects can include objects that are simply represented graphically, such as calendar events)easy to directly manipulate graphical objectsdifficult to manipulate more than one database-style objects
easy to implement history and/or repetitive operations (eg. scripts/macros)feedback from operation limited/implicitdifficult to implement history and/or repetitive operationsfeedback from operation usually explicit

Proposal

CLI's and GUI's clearly have different strengths and weaknesses. In summary, CLI's are better at issuing commands and dealing with database-style records or files and selection on textual-based searches; GUI's are better at direct manipulation of objects that can be represented graphically, and arbitrary but contiguous selections.

Using the two forms of UI in combination offers significant benefits:
  • remove indirectness of menu access
  • allow sophisticated sorting/searching and replacing, even in combo with GUI's graphical representations (to increase contiguity of selection)
  • allow history/redo/macro capabilities.

Implementation

So how do we implement this? For a PC, with it's large keyboard, screen, and pointer, I propose the following solution. Get rid of the menu bar (on the top of windows in Windows, and at the top of the screen on the Mac) and replace it with a command bar at the top of the screen. This line will show the command line as it's entered, and drops down a translucent list showing any autocompletion options or history.

The command line should be accessible with simple key toggle (like Enso, but probably modal, since holding down a command key while typing limits what you can type). But the CLI should also interact with the GUI's elements, for example, highlighting items which may match (i.e. are in the autocompletion list) with a special "tentative" highlight.

Command scripts should be able to span app's. So, for example, to do a search and replace in multiple documents, you might be able to simply type:

for file in C:\Documents\DS*.doc
open file in Word
Word::Replace "Series 60" "S60"
Word::SaveClose
end

This should fill in the files with autocompletion during the first line, and then execute it at completion.

Mobile Solution?

The problem with this scenario is pretty obvious: it is heavily keyboard relient. Without a full alphabetic keyboard the commands suddenly become more indirect again (with some form of input method intervening). And the long lists generated by autocompletion aren't very friendly on a small screen. Furthermore, keyboard styles vary so much from device to device (and even within the same device in an increasing number of devices, inspired by Sony Ericsson's P-Series phones) that it would be difficult for a user to become fluent in this style of interaction.

My preference is actually for a tree-structured command space, navigated using the keypad or stylus/finger. See DreamScribe for an initial implementation of such an idea. The benefit of this type of UI for mobile phones is manifold:
  • It is one-handed with one-handed phones
  • It works identically in both keypad and stylus-driven UI's, and doesn't require any additional hardware
  • Commands are visible
  • Commands are grouped by topic (like menus, but in an even more sophisticated way)
  • "Muscle memory" can be used for commands, with careful design (so the same command is always in the same place in the hierarchy)
  • Commands can be used from anywhere, unlike menus which need to be contextual in order to maintain their navigability (since they have a very flat hierarchy)
  • Commands can either be combined with the GUI (basically replacing menu commands) or feed into a CLI (with textually specified arguments)
  • Arguments can also be mapped into the hierarchical system, so that rather than long, linear lists of autocompletion options, hierarchies of possible arguments can be presented

There is a lot of potential in such a UI. Keystick (now "morphing" into Kanuu) showed that this was possible even with text input, and it looks like they're extending it to all sorts of navigation, just as I've suggested above. DreamScribe mapped calendar and contact attributes into a hierarchy, and the sky's the limit, really. See also Ring-writer as an innovative approach in this area.

Conclusion

So, while I think the combo CLI/GUI I suggest above would be great for PCs, I don't think it has much future in the mobile space. I really think the five-way hierarchy provided by joypads is a much better solution for mobiles.

Tuesday, 3 April 2007

Carnival of the Mobilists #66 and #67 and responses

This is a bit late, but Carnival of the Mobilists #66 is at All About Symbian, one of my major Symbian news sources. My post on Contextuality is in that carnival.

Also, Carnival #67 is up at Wap Review and it has an interesting post from David Beers which, in the latter half, bounces off my idea of contextuality, and extends it out into the interrelationship between applications and data.

To be honest, I really wasn't thinking along these lines, for two reasons: 1) I can write applications, but I'm not in a position to write OS's and 2) I've seen too many failures of frameworks that have tried to achieve this.

Regarding reason 2: I love the idea of the user being able to use any tool he owns on his current context. However the Newton and Pink (or Taligent), both showed how difficult this is to do in reality (the Newton got further, but only because it was less ambitious). Apple aren't alone in trying this, MS have given up on their DB-based filesystem, which was trying to do a similar thing. In fact, MS have been talking about the idea for well over a decade. The most successful attempt at this approach that I've personally seen was the Oberon project, which actually allowed any text to be treated as a command. Brilliant stuff, but quite limited in the real world.

I've had so many hopes for this type of capability dashed: OLE, OpenDoc, Novell's software bus, the Newton's data soup, PenPoint's object oriented integration, Symbian's DNL (Dynamic Navigation Links, which do actually work, but are missing the key functionality of "vectorability" -- maybe more on this later), etc. etc.

It's made me very cynical about this. But I still have hope. Maybe one day we'll all get things sorted enough that software will start getting out of the way and actually helping people do stuff.

(Oh yes, regarding reason 1, maybe it's worth thinking about how to create this sort of open environment hosted by an application framework, rather than natively via the OS... Hmm...)

Thursday, 29 March 2007

Cost/Benefits of open sourcing for Symbian

Over at The Mobile Lantern, Fabrizio Errante raises the idea of open sourcing Symbian OS. This has been raised before, so I thought it worth addressing.

Many people seem to forget that open sourcing has costs as well as benefits. The question for any particular product/project is, do the costs of open sourcing outweigh the benefits, or vice versa?

Let's do a quick overview for Symbian:

Benefits
  • Access to a bigger pool of developers to work on the codebase
  • Access to niche-specialist developers
  • Codebase becomes free (benefit to the customer)
  • Buzz
Costs
  • Codebase becomes free (cost to the provider, who can no longer make money from licensing)
  • Loss of control of codebase
  • Loss of control of developer quality (to be honest, much OSS is of very dubious quality, leading me to believe that either the developers don't have the time/energy to put in quality work, or there are few quality developers working on OSS)
  • Product direction driven by niches, rather than mainstream
Analysis

I contend that the above costs and benefits (which are not exhaustive), apply to all OSS projects, not just Symbian. But the impact of these costs and benefits are different for different types of projects. So how do they affect Symbian?

Well, the main one bandied about is the codebase becoming free. I have seen quite a number of comments about how Symbian will eventually lose out to Linux due to the COG (cost of goods) pressures on phones. The most recent example is from a analysts at ABIresearch. This, of course, flies in the face of significant evidence that indicates otherwise: the continuing success of Windows as a desktop OS, despite intense pricing pressures on PCs. But then, analysts usually don't seem terribly connected to reality.

The problem with all of this is, ironically, exactly what MS argues: OSS software is only free in the sense that the codebase, as it is, doesn't cost anything. Of course, the chances that the existing codebase will be satisfactory are fairly slim. (The irony in MS pointing this out, in case you haven't guessed, is that MS's codebase is far from satisfactory, and MS seem incapable of rendering it so.)

So the TCP (Total Cost of Production; made that one up in lieu of searching for the real term) of a Linux based phone is likely to remain at least as high as a Symbian one. So there is no real benefit to the customer (i.e. the handset manufacturers), and a very real cost to Symbian (they can't get any licensing revenues if Symbian is OSS).

How about the access to more developers and niche-specialist developers? Well, this benefit is offset by the cost that these developers are a) relatively poor quality and b) relatively uncontrolled. For Symbian, these costs matter. Symbian is creating an OS, not some dodgy web 2.0 app. Quality is critical for this type of software, and so is tight control. A poorly controlled API leads to fragmentation (just look at Linux). As for quality, Linux has demonstrated that OSS can deliver quality, but it seems to come at the cost of fragmentation (again) and bloat (both of code size and app UI).

Don't get me wrong, I use Linux (Fedora), but only as servers, where bloat and fragmentation aren't as critical. But on a phone? Give me a break!

The problem is that these forces are a natural by-product of the OSS process. They can be fought, but not completely neutralised. And they are all inimical to Symbian's strengths.

So should Symbian consider OSS?

No way!

Should Symbian release source code via a free license to developers? Hey, that's a different (and great) idea. And so should Nokia (with S60) and UIQ. The old days of ER5's Eikon source being available in the SDK were both better (access to the source meant inheritance was a lot easier in some ways because you could see what you were inheriting) and worse (Psion was lazy with documenting the code, because you had the source). Heck, if MS can release source to some of its UI code, I'm sure Symbian can.