Robot Not a Dev? Pre-order Now
Not a Dev?

Motion and Robots - Non-Verbal communication


I recently had a fantastic (although brief) conversation with a few folks around how movement affects the perception of voice services. Misty’s own @Dan is a great person to loop in on these conversations, but I also wanted to rope in Ross and Nick (your invites are coming to discuss this). But I have 2 questions for this group and the community:

  1. What are some good resources to look through when thinking about robot movement and how it is perceived by people? Is there a seminal paper or book that you’ve found helpful?

  2. Traditionally, what has been the most difficult part of experimenting in this arena? (Low-level programming? Cumbersome interfaces? Limited mobility in a robot? Limited perceptive capabilities? Expense? You tell me!)


Great thoughts Chris! This is a pretty advanced topic actually. I am assuming that you mean the nonverbal gestures that the robot exhibits have specific meaning and that you have meaning misalignment if you verbally say one thing but gesture another.

For instance, giving a thumbs up while saying “no” could create dissonance because you are communicating two different signals (a yes and a no). The implication would seem to be that the underlying gestural meaning needs to be communicated alongside the verbal meaning in time and space. For instance, the gesture is being directed at the addressee of the utterance and you should probably execute and align the gesture with the TTS system so that it doesn’t say “No” and then 5 seconds later lift its arm up and give a thumbs up.

Nonverbal communication becomes even harder when you start thinking about different morphologies. For instance the Paro robot may not have a hand and thumb so you’ll have to find a different way to communicate acknowledgement nonverbally. This means that for every strange robot that is created, the body movements will need to be explored to see what it can communicate and limits it’s communicated meaning to just what it is capable of communicating.

I have a few recommendations on authors but not a whole book. I like to read Michael Argyle’s work, Adam Kendon’s work, and a lot of the work in pragmatics is relevant. Two books that have stayed with me are meta conversations:

Horn, L., & Ward, G. (2008). “The handbook of pragmatics”
Brown, G., Gillian, B., & Yule, G. (1983). “Discourse analysis”.

Moving all of this body of knowledge into robotics is incredibly challenging for the reasons you speak of. We just don’t have the data to learn these gestures and their meaning and because of that we usually animate them. There are, as of yet, no great animation tools for robots. This means we’ve had to do it by hand or through tools like Maya or Blender just to animate.

Other challenges are usually unforeseen. You probably want the robot to look at you when talking to you… you probably want to use the body to refer to things in the environment when talking to you. All of this is done by hand usually. In addition, if you do it wrong then your robot is probably miscommunicating its intentions.

Just my take on this whole problem. I could go on but I’m looking forward to hearing from Ross who is an expert in proxemics - one area of nonverbal communication. :slight_smile:


I’m going to agree with everything Nick said (and likely everything Ross will say), but want to put a slightly different spin on things.

As @ndepalma explained, dissonance and coordination between multiple modalities is a huge issue, partially just do to technical limitations (moving lots of things in a coordinated fashion is hard) as well as the morphology matching problem he raised. But buried in there is the assumption that utterances have meanings, and gestures have meanings, and these are fixed and need to align. Indeed, a lot of the work in the area starts by trying to isolate one modality, develop expressions that are interpreted a certain way, and then merge multiple modalities together, in the belief that modalities will help each other out (some modalities are better at conveying different things), in conveying the correct interpretation to the human.

For humans and humanoid robots, this underlying assumption may be true (in a given cultural context), but for non-human robots, I think there’s some flexibility. My favorite example is tails - cat tail movements and dog tail movements mean very different things, for the same movement. Humans learn through interaction how to read and assign meanings to movements. Why can’t we do the same thing with robots? In this view, some base modality is used to bootstrap an understanding of other modalities. I.e, we can use verbal expressions to teach people how to interpret arbitrary gestures. Then, for different morphologies, we can extend understanding from a known modality to new ones.

On the animation front, I personally think hand-animating and coordinating multiple modalities of expression for a robot is a path to a dead-end. There’s just no way it can scale to give us the variation and nuance we’re going to need for robots that interact with humans day in and day out for years. I know there’s some great work in developing tools to ease the animation burden, but I really think we need to consider alternate, programmatic and generative ways to create robot expressions.

Now, to actually answer @CHRIS_IS_MISTICAL’s questions
1 - Honestly, I don’t really know of any one source. The field is so young and scattered. I’ve found some useful work in psychology on bodily expression [,%201998,%20bodiliy%20espression%20of%20emotion.pdf], in dance on movement representation [Labanotation - Wikipedia], and of course in the HRI field for putting it all on robots and seeing how people react [Knight, LaViers, Szafir to name a few names]
2 - For me, limited expressive capabilities in available robots and comparisons across robots. Almost all of the work I see has used a custom robot, with custom expressive capabilities. This makes it really hard for people to replicate and confirm work. If only there was a standard platform available with a wide variety of expressive capabilities…


+1 to what @Dan said. :heart:


Great conversation. As a former computer animator I find this discussion intriguing. Very early animation had similar modalities, a list of facial expressions substituted in frames as needed to speak or express. I can imagine methods to take classes of expressions, gestures, tonality and phrases and use machine learning to start to zero in on a combination of those modalities to create more depth of communication which improves with response over time. As I have not read much on this topic, I would think that robots like Sophia could be using such an approach? As well, Pixar and Disney must have a lot of research in this area? AI realtime animated characters will have similar challenges.

Ironically in the last decade or so with a shift to texting and email without the other modalities, we often find misinterpretation of intent of message (so we added emoji’s, oh joy!). This efficiency in communication sadly has left behind a more intimate interchange.


Based on personal experience. You have to be very care with hand gestures and robot movements. Many people will be scared or will be threaten by any sudden movements from a robot. Even friendly human gestures can be seen as a threat. I have also noticed that this happens more in the US than other countries too. Robot size and appearance also factor into a robots possible threat level.


Great point, I can never remember which cheek to start on and how many kisses in various European countries, and the appropriate handshake methods for different levels of respect in Korea, or that waving goodbye is actually inappropriate to your superiors…thus another modality will be in the mix, localization for various societies and social circles, or even specific settings for owners to determine their comfort level for physical interactions.

However, as the Sophia type robot modeling gets more and more human like, maybe the brain will disassociate the fact that you are interacting with a robot some day?? as in West World… : )


Don’t get me started on the Sophia robot!:joy:

I didn’t believe in the uncanny valley until I started to see all of the new sex robots.:open_mouth: My point is that Hollywood has done a good job of making many people apprehension of all types of robots. Robot developers just need to take that into account. How will a person react if a robot runs into them or hits them with a hand or arm while doing a nonverbal gesture? Most human sized robots weigh hundreds of pounds and can do a lot of unintentional damage.


I want to point out here that most of this discussion assumes robots are humanoid, and are expected to follow human cultural customs. If we give up either of those assumptions, the problem can be much easier. Coming back to animals, cats are cats the world over. You don’t pet them differently depending on what country you’re in (at least as far as I know, please let me know if I’m mistaken).

Rather than seeing robots as a human analogue, why not view them as their own type of entity and let them have their own methods of expression and interaction that mesh well with, but do not duplicate human systems?

There’s a fine balance, of course. The whole point of social robotics is to understand and leverage human social skills to improve the Human-Robot system. The promise is that robots can understand our signals and integrate into our lives, but I’m not convinced that total replication is the only or correct path.

As to Sophia (and to @sckart’s point), I certainly don’t know the details of its inner workings, but articles like [Human-like Robot Mimics 62 Facial Expressions] make me think that they’re still hand-developing emotional expressions for the platform. When and in what order they’re used I think is determined autonomously.

Disney’s done some interesting work in this area, I’m particularly fond of the Vyloo [Disney’s New Autonomous Personality Driven Robots] although it is still unclear how much of the actual expressive behavior is hand-developed.