Saying it 10 Ways

As we increasingly drift across the borders between graphic and voice interfaces, it’s time to remember that language is not a single layer. In fact, any language is a combination of ten layers in synchronicity. Designing the next generation of voice assistance and voice interfacing thus demands attention to linguistic detail – and indeed paralinguistic detail.

At the heart of any language is its grammar – the combination of sound, form and sequence which arises from phonology, morphology and syntax. Ferdinand de Saussure, the Swiss father of linguistics, termed this trinity langue. Langue is the mechanical system of sense-making which is necessary to compose any unit of communication. It is, unfortunately, what is increasingly left out of modern educational programs in language learning. 

But, complex as grammar and langue are, it only gets really interesting beyond the purlieus of this linguistic mathematics. Semantics, the next layer out, is where we find the homonyms, homophones, double entendres, false friends and neologisms that confuse learners and make a hash of cross-cultural communication. And once we orbit beyond the realm of semantics, we are truly in the zero gravity beyond language per se. 

Beyond language is what we call paralanguage – the entire gamut from pragmatics or words as social interaction to chronemics or timing as a language. Cultures, ultimately, are composed of just two ingredients – time and space (even the Latin word culture itself originates from the word for a ‘hill’). Chronemics is the language of timing, just as proxemics is the language of space. Where and when you say something is ultimately of far greater significance than what or how you are saying it. Ask any politician.

De Saussure termed semantics and pragmatics parole, the what-you-mean and how-you-mean-it of language. This is exactly the spot where voice interfaces are now at, as they seek to respond to context and inspire conversation. A conversation, after all, is another Latin term meaning ‘to live with’. Living with customers, even on a transient basis, demands an understanding of their context. Beyond pragmatics – which is how to say something so that it is socially successful – lies the vast territory of everything that is communicated by being unsaid: semiotics, kinesics, oculesics, haptics, proxemics, chronemics. Forgive the poetry gentle reader, we mean language as symbol (as in colours), language in gesture (as in shrugs), language as eye contact (as in staring), language in touch (as in handshaking), language as spacing (as in ‘social distancing’) and language as timing (as in customer anticipation). 

A conversation is said in words that are constructed using grammar and langue, but it is really a linguistic equation involving parole and paralanguage. Even a voice interface with a bodyless bot or a physical voice assistant involves parole (a social context) and paralanguage (a space-time continuum). Designing conversational interfaces demands a knowledge of all these levels and layers simultaneously. But most of all, it requires a deep understanding of human/machine pragmatics. 

Pragma is the Greek word for ‘deed’. Words, as linguist John Austin reminded us, are actions in themselves, and the title of his famous book – How to Do Things with Words – could just as easily apply to the whole new industry of conversational AI. Yes, words are actions in themselves and when we interface with machines we want those words not only to be understood but anticipated and appreciated. 

These are the kinds of ideas that animate Area22 as a start-up in the voice space. Grasping the entire equation before designing the interface is critical. What is a conversation? What elements in a conversation are beyond language? What kinship can be evolved between humans and their machines so the latter may better serve them? Can we “Un-geek” techspeak so that this interface is natural, intuitive and supple? Can this new space evolve to be instinctive, personable, even humourous?

You can say a thing ten different ways. In fact, true communication is always saying things ten different ways – as sound, as words, as sentences, as meanings, as contexts, as signs, as gestures, as touch, as proximity and as timing. Right now, voice interfaces have reached contexts. Mastering this layer will involve not just engineering prowess but social smarts in speech acts, conversational phenomena and facework. Are the bots ready for class?


This article is written by Dr James McCabe. Dr McCabe is a special advisor to Area22 - operating in the capacity of Chief Storyteller. Area22 are focussed on transforming our experience of search in the Metaverse by leveraging the power of conversational voice. Conversational voice has the capacity to completely re-write how we all engage with technology - bringing us closer to a real human experience.

Previous
Previous

How to Converse:The Future of Voice