SAIBA - Multimodal Behavior Generation Framework

Lead by: Hannes Vilhjalmsson, Norman Badler, Lewis Johnson, Stefan Kopp, Brigitte Krenn, Stacy Marsella, Andrew N. Marshall, Catherine Pelachaud, Hannes Pirker, Kristinn R. Thorisson

The generation of natural multimodal output for embodied conversational agents requires a time-critical production process with high flexibility. To scaffold this production process and encourage sharing and collaboration, a working group of ECA researchers has introduced the SAIBA framework (Situation, Agent, Intention, Behavior, Animation). The framework specifies multimodal generation at a macro-scale, consisting of processing stages on three different levels: (1) planning of a communicative intent, (2) planning of a multimodal realization of this intent, and (3) realization of the planned behaviors.


The overall goal of this international effort is to unify a multimodal behavior generation framework for Embodied Conversational Agents (ECAs) so that people in the field can more easily work together and share resources.

So far the following research centers and institutions actively participate in the effort (alphabetical):

Articulab, Northwestern University, USA
Artificial Intelligence Group, University of Bielefeld, Germany
Austrian Research Institute for AI (OFAI), Vienna, Austria
Center for Analysis and Design of Intelligent Agents (CADIA), Reykjavik University, Iceland
Center for Human Modeling and Simulation, University of Pennsylvania, USA
Deutsches Forschungszentrum für Künstliche Intelligenz (DFKA), Germany
Human Media Interaction, University of Twente, The Netherlands
Human-Oriented Technology Lab, University of Zagreb, Croatia
Information Sciences Institute (ISI), University of Southern California, USA
Institute for Creative Technologies (ICT), University of Southern California, USA
Intelligent Agents and Synthetic Characters Group at INESC, Lisbon, Portugal
IUT de Montreuil, University de Paris 8, France


The first step towards a unifying representational framework for multimodal generation has been to lay down the general planning stages and knowledge structures that are involved in the creation of multimodal communicative behavior. We do not want to impose a particular micro-architecture. Yet, as our goal is to define representation languages that can serve as clear interfaces at separate levels of abstraction—building upon our experiences from previous ECA systems—we need to modularize the problem.

We aim for the representation languages to be:

Independent of a particular application or domain
Independent of the employed graphics and sound player model
Represent a clear-cut separation between information types (function-related versus process-related specification of behavior)
The generation of natural multimodal output requires a time-critical production process with high flexibility. To scaffold this production process we introduced the SAIBA framework (Situation, Agent, Intention, Behavior, Animation), and specify the macro-scale multimodal generation consisting of processing stages on three different levels:

Planning of a communicative intent
Planning of multimodal behaviors that carry out this intent
Realization of the planned behaviors
These processing stages are depicted below:

The interface between stages (1) and (2) — Intent Planning and Behavior Planning — describes communicative and expressive intent without any reference to physical behavior. We call the language that we propose for specifying such information the Function Markup Language (FML). It is meant to provide a semantic description that accounts for the aspects that are relevant and influential in the planning of verbal and nonverbal behavior.

The interface between (2) and (3) — Behavior Planning and Behavior Realization — describes multimodal behaviors as they are to be realized by the final stage of the generation process. We propose the Behavior Markup Language (BML) for this purpose. It provides a general, player-independent description of multimodal behavior that can be used to control an embodied agent. Nevertheless, it needs to provide a sufficient level of detail in describing behavior, from the mere occurrence and the relative timing of the involved actions, to the detailed definition of a behavior’s form.


The 4th BML Workshop
Amsterdam, The Netherlands, September 13, 2009

The 2nd FML Workshop
Budapest, Hungary, May 12, 2009

The 3rd BML Workshop, hosted by MITRE Corporation
Boston, USA, June 2-3, 2008

The 1st FML Workshop(at AAMAS 2008)
Estoril, Portugal, May 13, 2008

The 2nd BML Workshop
Paris, France, June 7-8, 2007

HUMAINE WP10 Joint Workshop on Representations for Multimodal Behavior
Vienna, Austria, November 7-8, 2006

Representations for Multimodal Generation Workshop
Reykjavík University, Reykjavík, April 23-25, 2005

Embodied conversational agents - let’s specify and evaluate them!
Bologna, Italy, 16 July, 2002

Screen_shot_2010-02-08_at_5.15.25_PM.png (29.4 kB) , 02/08/2010 05:16 pm