Made in PGH &rquo; Established 2009

The Future of Siri.

Confession: I basically upgraded from the iPhone 4 to the 4S just to mess around with Siri.1 While the experience has been magically delicious in nearly all respects, one can’t help but continually bump into what feel like arbitrary walls. Siri can apply a relationship to a person (“Joel is my brother”), but she can’t change his birthday or move him to the top of my favorites list or perform thousands of other seemingly trivial actions. Like many others, I’m delighted by what Siri can do yet frustrated by the current limitations.

The Present.

Apple has cracked open a door of possibility with the introduction of Siri. It’s not the first interface to accept voice as an input, but it might be the first to do it in a way that’s both accessible to the casual user and popular enough to matter.

Those who are quick to dismiss Siri as a gimmick cite the aforementioned functional limitations, the awkwardness of speaking aloud in public places, and the latency and artificiality as compared to science fiction’s portrayal2. These are all true. Many features of the iPhone are unavailable via Siri. It would be weird for someone in an office or on a bus to start talking to his or her phone. (Weirder than the Bluetooth headsets people already use?) Needing to wait for Siri to transmit and fetch data from a distant server, enunciating with excruciating precision, and finding oneself at the beck and call of those chipper beeps can be disenchanting. Yet what are these but the pains of an infant technology cutting its teeth in a world of mature graphical user interfaces? Should we reject voice-driven user interfaces a priori, scorning the possibility of hardware and software improvement?

We have, right now, a useful tool and a tantalizing glimpse at what is possible. That’s more than enough for me.

The Future.

The immediate future looks clear. Apple will continue to refine the Siri experience by removing obstacles and adding features. The foundation appears to be in place for long term growth. I have had few issues with Siri understanding my speech, and that seems to be the common experience.

What we all want to know is: how soon can Apple open the app floodgate? It’s a bewitching notion. The iPhone before apps was revolutionary. The iPhone after apps, indispensable. Can the same be true of Siri?

A Hypothetical Scenario for Siri-alizing Apps.

First, let’s give Siri the ability to open apps, something that it can’t do right now.3 “Siri! Launch Tweetbot.”4 Tweetbot appears on the screen. Because we’re smitten with this Siri thing, we want the ability to perform actions in our current context.

Consider what happens next. Since Tweetbot saves my state automatically, I’m looking at my “Sports” Twitter list. From this screen alone, I can: Change the list I am viewing, open the compose tweet screen, refresh the list, search the list, switch accounts, select a tweet as the target for additional actions, switch to my mentions, direct messages, starred tweets or profile, or view replies to a tweet. That’s one screen, and I probably didn’t even provide a comprehensive inventory of available actions.

“Refresh tweets” might be a perfectly adequate synonym for the pull-to-refresh mechanic we’ve become accustomed to, but what if I want to interact with a specific tweet? Should a “cursor” appear on the screen indicating the  currently active tweet? Of course not. Tweetbot, like every other native iOS app, has been designed with touch as the foremost interaction method.5 By attempting to force voice input into our current graphical conventions, we’re in jeopardy of the same errors game developers have routinely made in attempting to port joystick-based games to the touch environment. What was developed for one input, especially if the input was properly understood, is inappropriate to varying degrees for us in another. Furthermore, within this scenario, we have created for ourselves both the non-trivial job of replicating all screen functionality as voice functionality and restricted what we can do with voice to what we can see on the screen.

What’s the Alternative?

As much as I would like to see Siri become a tool for users willing to spend the time necessary to learn the interface,6 Apple appears to be determined to create something else, something that hasn’t really been done before: a conversational user interface. You state a command, Siri complies (if possible) and provides feedback. It’s a much longer, more tedious process, but it might be the only one that can actually work without extensive training.

So what should Apple could do to truly embrace voice-driven user interfaces? First, abandon the traditional concept of applications. In the world of Siri, applications are incidental. Data sources matter, commands matter, natural language parsing matters—applications are the occasional byproduct of asking Siri to perform a task and having that request fulfilled. The appropriate paradigm is services.7 Instead of registering applications, developers would register a Siri service with Apple. The end user would navigate to a special section of the App Store that housed only VUI services. It’s Newsstand for Siri!

Maybe Tapbots wants to make a Siri service. Services (unlike applications) are able to be used instantly (within Siri) by simply stating the service name plus the desired action. There is no launching a service. “Use Tweetbot to read me my tweets.” Siri answers, “I am loading your latest tweets.”8

Once Siri begins reading the tweets, we should expect her to pause after each tweet to allow us the opportunity to respond. Unfortunately today that means pressing the microphone button on the screen. If Siri is to achieve its true potential, we’re going to need to be able to invoke it by just saying “Siri!” and, nearly as importantly, we need to be able to interrupt it.9

At this point we might say something like: “That’s funny. Let’s star that tweet.” Behind the scenes, Siri is magically parsing my cryptic human language. As we’re in the Tweetbot context, Siri knows to interpret these commands against the Tweetbot provided options. “Star” plus possibly a dozen other words can perform the same action. It might also accept “like”, “favorite”, “heart”, “save”, and more. It’s also going to need to understand the word “that”. For Siri, “that” can mean a lot of different things. Here it’s critical it means “the thing we were just talking about”. It also needs to ignore “that’s funny.”

What happens if Siri doesn’t understand? Well, at first Siri should probably break out of context to see if there are any alternative means of fulfilling the query. If not, Siri already has error handling, she says, “I’m sorry, I don’t understand”, or some such euphemism.

Back in the narrative, we’ve starred the tweet. Siri either continues to read the tweets automatically or needs to be re-engaged by us. Let’s be explicit, “Siri, resume reading the tweets.” “Resume” or “continue” should always restart the previous task. Siri moves on to the next tweet, but by this time we’re bored. We say, “Read tweets from my sports list.” The keyword “list” needs to be interpreted as a Tweetbot command. The name of the list needs to be processed, but at this point, we’re right back where we started. Even a slight variation, however, could have radically different results. What if we said instead, “Read tweets about sports”? In that case, Tweetbot might query the Twitter API for the tag “sports” or it might even have a dictionary of sports-related terms if the data were pre-structured.

Reality.

Voice-driven user interfaces were fantasy or science fiction at best. Now, we have one that works reasonably well within a narrow enough context. Even better, Siri is available on the computer we carry with us all the time rather than the one sitting on a desk. Yet, for now, the magic actually takes place not on this pocketable device but instead on battalions of servers in a distant data center. The delays we experience while using Siri are crucial. Audio files of the sounds recorded by Siri as we uhm and uhh our way through asking her to do us a favor need to be shipped across the Internet, processed into her best guess at the words we intended to communicate, submitted to her vast database for comparison with all possible ways we could have asked her assistance, and, eventually, offered back to us as a discrete action she is able to take on our behalf.

That Siri works at all is a tribute to modern advancements in processing strength, power consumption, and network speed and ubiquity10. That Siri is not yet the omnipresent, omniscient, omnicapable Computer of Star Trek is in all likelihood a difference in scale not kind. It is not unthinkable to imagine a future only a few years from now in which a device the size of the iPhone can remove the quirks and sources of friction we currently experience. With better batteries, more storage, faster processors, smarter algorithms, and speedier connections, it may not guaranteed to happen, but who will deny the realistic possibility?

This is a revolutionary interface. We’re not going to get by using our hard-earned graphical instincts. The Herculean task facing Apple is educating developers on how to write a Siri service. Making Siri work with Apple’s internal services was no doubt difficult—as evidenced by the frequent down time and the relatively few available features. Enforcing this level of conceptual change on external developers is almost unimaginably hard. It may not even be possible. Apple may decide to keep Siri in-house indefinitely, slowly expanding the available services. I could live with that. It already makes my life much easier in many ways. But I know we’re all just dying to see the full potential realized. For that to happen, Apple need to unleash this force by enabling third-party development. The only way this works, however, is to conceive of it as a completely separate interface not handicapped (or propped up) by the existing iOS interface paradigms of a home screen, little icons representing applications, gestures and the rest. The new interface is the Siri voice and what can be shown within the Siri application. Applications are now simply services of Siri. And Apple is going to need to drill the concepts of VUI into developers who have never dreamed of such a thing. Remember the HIG? That’s going to be big again. Just like the release of the Macintosh required developers to learn and accept GUI principles, Siri redefines what it means to use a computer, and that means grokking VUI from the ground up.11

  1. That, and the new cameras. The iPhone is the only camera I use. With two tiny kids, the camera comes out a lot. []
  2. See TNG, among others. []
  3. Application launching is something of a middle ground for me. While I believe Apple is most interested (and ought to be) in unleashing speaking and listening as a peer experience to looking and touching rather than voice as simply an alternative for your finger, I expect them to make small compromises in that direction. Essentially, there’s no reason voice shouldn’t make the whole experience richer rather than living in a one-dimensional ghetto. []
  4. Where by “Siri!” I mean, “press and hold the home button until Siri launches.” []
  5. Apple’s incredible accessibility achievement with the iPhone notwithstanding. []
  6. Like Quicksilver or Enso. []
  7. Incidentally, services are the one thing I want more than anything else on the iPhone today. Developers have hacked around this with custom URL structures, but it’s no substitute for the real thing. []
  8. While it’s important to be generous in what Siri can accept, certain components are essential to accomplishing the desired task. At minimum, we need to include the name of the service (Tweetbot, “subject”), the intended action (read, “verb”) and the object of the action (tweets, “direct object”). Other modifiers can also be supported. []
  9. This is no small challenge. Our phones would need to be constantly listening for this keyword which is battery killer. At this point, we’re basically talking about the including all the computational power of Apple’s data center in a hand-held device. We’re not even close. []
  10. Or have we now moved from ubiquity to invisibility? []
  11. I have chosen to focus on what I believe Apple may have in store for Siri and, also, what the perfect voice user interface looks like. It’s entirely possible that many good or at least interesting VUIs could be designed to supplement the traditional graphical user interface. Unfortunately, companies can generally only really go in one public direction at any given time. Perhaps Google, Microsoft, RIM, and HP can take up the gauntlet for bringing innovative voice features about in other ways. []

Chat Simiply Icon for Fluid

An online chat service? Sounds like a Fluid app to me. I whipped up a quick PNG for use as a Fluid icon. Doesn’t look half-bad. Now if only I had someone to talk to… add ‘nate’ and ‘jay’ on Chat Simply.

Here’s the icon: Chat Simply icon for Fluid.

Introducing GoPano.

In late 2010, Full Stop was approached by a Pittsburgh company interested in working with us to design and develop a website for sharing 360° videos. EyeSee360 had a decade of experience building lenses that enabled camera owners to create one-shot 360° photos and videos. Now they wanted to use that knowledge to create the first ever device that would bring that capability to the iPhone. The GoPano Micro is the realization of that vision, and GoPano.com is the site we built together to allow people to create and share these unbelievable experiences.

For the past year, beginning even before the incredibly successful Kickstarter campaign for the GoPano Micro1, the EyeSee360 team has been working around the clock to create the best hardware and software possible. We were fortunate to have a role incubating the video sharing site as well as the EyeSee360 company site, the Shopify-based store for buying GoPano products, and the official GoPano iPhone app for recording, sharing, and viewing 360° videos. If you have an iPhone, you can download the app now to view the videos.

To say we were thrilled to work with a local company making unique, exciting products for the best phone in the world would be an understatement. We are eagerly looking forward to seeing the GoPano platform improve as the tools for making 360° video are made available to everyone. The first time you are able to simultaneously capture your kid blowing out the candles and her grandmother’s reaction you’ll be sold on the appeal of omnidirectional video.

It’s not often you get a chance to participate in the early stages of what has the potential to revolutionize an industry. We are grateful for the opportunity and pleased with the result.

Check out this 360° video of the Pittsburgh Penguins warming up. Click and drag to view the video in all directions.

  1. The GoPano Micro was one of the most successful projects in Kickstarter history, and the most funded iPhone project ever. []

Designing for Emotion & Mobile First.

Two new books from A Book Apart. Designing for Emotion by Aarron Walter and Mobile First by Luke Wroblewski. Both I expect will be required reading for the discerning web designer and builder.

Siri for the iPad?

Russell Beattie wants to see Siri on the iPad. I’m sure we’d all like to see Siri integrated across the board: iPhone, iPad, MacBook, iMac, AppleTV. Voice recognition isn’t new, and conversational UIs certainly aren’t either. What is new is their advance from science fiction to science experiment to niche, barely usable technology to mass-deployed consumer technology as part of the iPhone 4S.

CSS Shaders Coming.

Adobe sends a proposal to the W3C that introduces advanced filters for use on DOM elements. The open web stack has a long way to go before it has the power of native layout, graphics, storage, logic, etc., but at the very least it’s reassuring to see it continually advancing. Oh, and welcome aboard, Adobe.

HTML Compass.

James Pearce builds a compass with the new HTML compass API in iOS 5′s version of WebKit. Performance is not the difference between web and native. The difference is device API access.

Better Restaurant Menus.

Unit Interactive decided to throw together a little demo of how to make a better restaurant menu. Anything’s better than Flash or PDFs.

Invest in Process and Communication.

As the co-founder of a tiny design and development shop that currently consists of three people (including me), our communication problems are miniscule compared to those of larger organizations. Yet these are the kinds of things that keep me up at night:

I was employee #20 at the first start-up and the first engineering lead. Over the course of two years, the team and the company exploded to close to 200 employees. This is when I discovered that growing rapidly teaches you one thing well: how communication continually finds new and interesting ways to break down. The core issue being the folks who’ve been around longer who also tend to have more responsibility. As far as they’re concerned, the ways they organically communicated before will remain as efficient and simple each time the group doubles in size.

They don’t. A growing group needs to continually invest in new ways to figure out what it is collectively thinking so anyone anywhere can answer the question: “What the hell is going on?”

That was Michael Lopp in “The Rands Test“. Communication is some non-trivial percentage of the success of any operation. When your job involves communicating on a day-to-day basis with multiple separate clients each with internal teams and expectations, it’s easy for things to get out of hand. We get along right now with Basecamp, Campfire, Skype, Gmail, bug trackers, version control, and face-to-face conversations, but it’s hardly a perfect solution. Each engagement brings its own challenges. Processes need to evolve continually. Managing the design and development of a brochure-ware site is a almost almost indistinguishable from that of a large web-based application—which itself is different in degree if not kind to designing and developing an iOS application and working as part of a larger team. Point being: invest in process and communication if you care at all about the quality of your product.

SMACSS

I’m only part of the way through the first chapter, but it’s so good that I’m pre-emptively recommending Jonathan Snook’s extended write up of his CSS architecture and guidelines, SMACSS. If nothing else, it’s worth examining your own practices in light of Jonathan’s extensive experience.