PUBLICATIONS: http://www.mindmakers.org/dmsf/humanobs | MORE PUBLICATIONS: http://alumni.media.mit.edu/~kris/select_publ.html | VIDEOS: HUMANOBS Videos¶
SUMMARY. Humanoid Agents that Learn Socio-Communicative Skills By Observation (HUMANOBS) was a project that aimed to build an A.I. system that could learn complex skills by observation. The 3.5-year project, started in 2008, was funded by Europe’s FP7 with participation from 6 universities and companies in Europe. Completed in 2012, the project was conceived and lead by principal Investigators Dr. Kristinn R. Thórisson and Eric Nivel at Reykjavik University’s AI laboratory CADIA. The project resulted in the blueprint and implementation for a recursive self-improving system, Auto-Catalytic Endogenous Reflective Architecture (AERA), that can bootstrap learning of complex spatio-temporal skills from tiny seed knowledge. The system was used to build an agent called S1 that demonstrated this by learning how to conduct real-time TV-style multimodal interviews by observing people doing it. After only 20 hours of watching it could perform complex interviews about recycling materials without making any mistakes in grammar, sentence structure, question-answer pairs, turntaking, or time limits – for which no a-priori knowledge had been provided – in either role of interviewer or interviewee.
[Adapted from Autonomous Software that Learns Social Interaction, Projects Magazine, 23:58-61, June 2013.]
The robotic agent, built by an international team lead by researchers at Reykjavik University in Iceland, has pushed the boundaries of artificial intelligence by automatically learning socio-communicative skills. It’s a big step towards the ultimate goal of creating intelligence that is both self-sufficient and adaptable in a wide variety of environments.
From sophisticated software that can assist us with information overload on the world wide web to digital cameras that detect faces and self-driving cars, artificial intelligence has already had a noticeable impact on the world we live in. Scientists and engineers continue to build increasingly complex system architecture, but an intelligent system with sophisticated levels of autonomy remains the Holy Grail.
Imagine an intelligent agent that could operate with a level of self-sufficiency that allowed it to adapt to a wide variety of environments without knowing in advance what might come its way; imagine an intelligent agent sophisticated enough to understand and imitate the complexities of human social interaction. This may sound like a notion plucked from a sci-fi movie, but in fact it is much closer to reality than we may think.
The system the HUMANOBS team have created is a comprehensive cognitive architecture, named AERA (Auto-catalytic Endogenous Reflective Architecture), that achieves higher levels of autonomy than prior architectures. One way to measure this is with the ratio between the amount of things learned and the amount of a-priori knowledge its human designers must provide. The HUMANOBS team decided to put their system to test based on the scenario of a TV interview: “We needed a task that would be viewed as relatively complicated – impossible for current AI and complex for even a human to learn,” explains Dr. Thórisson. Socio-communicative skills were chosen as a way to evaluate the power of the system, a complex enough task to convince anyone that their new AI was presenting some new and powerful principles worth taking note of. “We didn’t want our AI to require extensive hand-coding like the expert systems of the past, but instead be able to acquire the data for their programming on their own, and then to manage their quote about learning its own growth through self-programming.” To demonstrate the abilities of the system, the system observes an interview between two humans engaged in a mock-up TV interview about waste recycling. The interviewer asks the expert interviewee to talk about the objects on the table in front of them.
“After about two or three minutes of watching the humans do this,” says Dr. Thórisson, “the system starts to understand what’s going on, how to structure and conduct such an interview, and has generalized some of the main principles of human communication to a point that it can be asked to take over the interview, either in the role of the interviewer or interviewee, and continue to interact with the other person.”
As giving computers vision through cameras is an unsolved problem, human movement and speech is tracked with special high-accuracy sensors and microphones, S1 agent interviewing a human which capture human interaction in realtime for replication in a virtual world. Two avatars represent the interacting humans, each one seeing the other’s avatar on their screen, not unlike a video conference call if you were to see the other person as a three-dimensional graphical avatar that precisely mimics everything the other does. "This setup allows us to represent the interaction as a stream of accurate digital data, which the AI observes in realtime, as the interview unfolds” explains Kristinn. The virtual scene captures human natural behavior in all all important spatio-temporal details, down to arm, head, and body movements, including hand gestures.”
ACHIEVING DOMAIN INDEPENDENCE¶
Dr. Thórisson says it is clear that the system is able to learn “many nuances” of human social behavior “such as the co-ordination of gaze and head movement and speech in the service of conducting an interview”. However, the system is not specifically designed to learn socio-communicative skills – it is general enough to learn any skill of similar complexity. While capturing the behavioral signals of human communication was certainly one of the challenges of the project, an even greater challenge was to build the system so that the learning mechanisms would not be limited to a particular domain but rather work for any task in any scenario.
Achieving domain independence required new levels of system integration and synthesis of ideas. Although the foundation of the project is software engineering and computer science, significant shortcomings in the accepted methodological approaches these fields offer were not sufficient to enable the project to realize these goals. To push further a wide array of unorthodox ideas from various sources, including cybernetics, mathematics, and non-axiomatic logic, were pulled into the mix. The result is an unusual system architecture that breaks with prior software architecture traditions in many ways. “We have come out with a very un-modular system,” explains Dr. Thórisson. “It is fairly difficult to explain because doing so one must build on concepts that are not part of any scientific field’s vernacular except perhaps biology. In some ways our system is like a car engine. If you take anything away the rest will not work. While you could say a car engine is modular because you can take pieces out and look at them, well, so can you with our system. But as far as the operation is concerned, getting from A to B, that requires everything under the hood as well as the axle and the wheels. But because the system must autonomously acquire vast amounts of knowledge – in a way program itself, an analogy to an automobile engine falls really far off the mark. Coming up with a unified solution has been the greatest challenge,” he continues, “but a necessary part of the solution, as anything else would introduce debilitating slowdown in performance, architectural complexity, or both.”
At the heart of this success is AERA’s ability to develop its own level of understanding, through both accumulated experience and internally simulated predictions of how the world works.
“Our system can accept goals from its designers and then, depending on how much extra information these designers give, it will come up with sufficient understanding of the phenomenon in question to actually meet those goals in a complex scenario. S1 full screen
This is a key feature of AERA and it’s a very practical one because now you can automate tasks that couldn’t be automated before. You essentially give it a small example description of a problem domain and the goals that you want it to achieve in it, and hit ‘run’.”
Dr. Thórisson and his collaborators are surprised by the level of domain-independence they have achieved with the AERA system. “Find anything in the real world that is as complex as a human to human interview," says Kristinn, “such as ploughing fields or picking lemons - actions that are fairly complex and operate in under real time constraints in the real world – and our system can be applied to that.”
An Intelligent Future
Despite all of its challenges the success of the project has been, according to team members, beyond everyone’s expectations. “The ambitious goals originally set for the project have all been met. We are achieving all our hopes and dreams for this project,” says Dr. Thórisson. “The system that we have come up has already demonstrated the ability to do a small-scale version of a real human-human interview, a task that no prior system could even hope to achieve, as existing methodological and theoretical assumptions simply don’t allow it. We have very high hopes for scaling this up to very complex human interaction and, more importantly, other complex tasks in a vast number of other domains.” The potential application for the system developed by the HUMANOBS project is extensive, not least because of the system’s ability to deal with distractions and unexpected situations. Development of this system could be applied to a myriad of complex situations such as underwater exploration or manufacturing.
With the HUMANOBS team riding high on their success with AERA, Dr. Thórisson says they are keen to build on this to keep their lead. “We are currently looking at the next steps – we would like to see the technology developed further within academia, as this is a platform that can shed light on a number of complex cognitive processes such as learning, attention, autonomy, and intelligence itself. We would also like to see its application in various domains, sooner rather than later, realizing the next generation of automation systems, and possibly bringing us one step closer to a science fiction future of very capable robots able to help with complex real-world tasks – in medicine, in manufacturing, in disaster relief – in all sorts of important situations.”
INTERVIEW WITH ERIC NIVEL¶
Q: How is the underlying technology for the AERA system different from prior work in this field?
Eric Nivel: Our methodology is based on a new constructivist A.I. approach, defined by Dr. Thórisson in his keynote paper from 2009 on the subject. In sharp contrast to most other approaches, instead of ignoring constraints of time and knowledge, or treating them as secondary concerns, we make these of central importance. Our working definition of intelligence was proposed by Pei Wang, and states that intelligence is “to adapt with insufficient knowledge and limited resources”. We aim at building intelligent machines (a) that operate in open-ended environments and in real-time, (b) whose cognitive architecture is completely domain-independent and, (c) that exhibit self-programming abilities. AERA-based systems learn by observing intentional agents in their environment, and develop experience-based semantics automatically. The only form of knowledge given to them by their programmers – their innate knowledge and drives, so to speak – is provided in the form of a tiny amount of bootstrap code. This idea of course is not new. What is new is the integration of such an approach in a coherent and unified reflective real-time architecture that can be implemented and run.
*Q: What has been the main challenge in implementing your ideas for the AERA system?
*EN: The main challenge was to design the system following simultaneously two opposite directions: top-down, from the specification of the desired cognitive functions to the implementation, and bottom-up, to drive a synergetic implementation towards meeting a set of requirements that we believe all systems must ultimately meet to be capable of general intelligence. Present mainstream software engineering methodologies don’t support our architectural approach. Our approach relies on vast amounts of parallel fine-grained processes that must be coordinated precisely and efficiently: this is very different from the classic way of engineering large systems, and this raised issues which we could find no support for in existing design/coding toolboxes or methods. Add on top of this that our systems operate with temporal precision around a few milliseconds and you get an idea of how the difficulty of this engineering challenge – and what it may be like to debug a large-scale system like this. In the process we had to solve several issues related to these facts, but many are only partially solved. Since existing logics are either axiomatic, ignore realtime altogether, or both, we invented new principles for non-axiomatic realtime reasoning.
Q: The acronym, AERA, stands for the key underlying concepts in your work – what are these concepts and how do they define your approach?
EN: AERA stands for Auto-catalytic Endogenous Reflective Architecture. We have adopted a stringent definition for autonomy, that is, an autonomous system that is operationally and semantically closed. In our context, auto-catalysis refers directly to the operational closure – the ability for a system to expand and modify its own internal agency by means of the operation of said agency (its own architectural structure). Reflectivity refers to the semantic closure, i.e. the ability for a system to control the (re)organisation of its agency. An AERA-based system is also endogenous in the sense that it is (a) self-maintained and, (b) originates from itself through interaction with its environment – the processes implementing the two aforementioned closures are internal and cannot be modified by anyone else but the system itself.
Q: For the project you designed a completely new programming language, Replicode. Can you tell us a bit about what makes it special?
EN: Replicode has been designed to address our specific requirements, in the main: (a) ease the programming and control of large populations of parallel fine-grained data-driven processes, (b) treat what the system itself does as first-class knowledge – to allow the system to reason about its own operation, (c) implement the non-axiomatic logic mentioned earlier, along with the necessary reasoning mechanisms, (d) represent knowledge as executable shared models – which means essentially that things in the world, which the system can think about, are defined by what can be done with them or about them, what other agents do with them, or what the system can predict about them, (e) drive knowledge formation by goals and predictions, (f) allow the formation of dynamic model hierarchies – dynamic because models come and go as they are learned, defeated, or put out of context; hierarchies because predictions flow up the system’s abstraction hierarchy while goals flow down, (g) simulate the outcome of different possible courses of action and commit to the plausibly most appropriate one, (h) allow “throttling-down” some cognitive tasks when resources become scarce to focus on other tasks considered by the system as being more critical at any moment in time, (i) operate in soft real-time and, (j) allow the distribution of the computation load over a cluster of computers.
Reykjavik University’s A.I. lab, Iceland - Principal Investigators