EMBODIED AGENTSIN AUGMENTED & VIRTUAL REALITIESCourse E6998-004, Dept. of Computer Science, Columbia University, Fall 2002Prof. Kris Thórisson, Ph.D. |
|||||||||
|
|||||||||
22 |
Prosody |
|
The “form” of the speech - not what, but how it’s said |
||
Prosody is a continuous, acoustic signal, with pitch, volume, timbre |
||
Goal: Identify pitch accents, pauses, “rhythm” |
||
Types of information recognizable from intonation only:
|
||
Problem: Speakers have different ranges, pitch |
||
23 |
Prosody & Intonation |
|
Examples of segmentation
|
||
![]() ![]() |
||
Example of intonation for the utterance “Take me to Jupiter” plotted to a logrithmic frequency scale.
|
||
|
|
||
Output of real-time analysis of the utterances “What planet is that?”
|
||
24 |
Unimodal Perceptors |
|
25 |
Unimodal Perceptors |
|
Object code example (1)
|
||
26 |
Unimodal Perceptors |
|
Object code example (2)
|
||
27 |
Multimodal Perceptors |
|
28 |
Multimodal Perceptors |
|
Module example (1)
|
||
29 |
Multimodal Perceptors |
|
Module example (2)
|
||
30 |
Blackboards |
|
Blackboards simplify design |
||
![]() |
||
Blackboards solve:
|
||
31 |
Blackboard Example |
|
Example of blackboard data stream with timestamps |
||
(TAKING-TURN T 8804072) (SPEAKING T 8804071) (TURNED-TO-ME T 8804070) (FACING-ME T 8804069) (FACING-DOMAIN NIL 8804069) (TURNED-TO-ME NIL 8803965) (FACING-ME NIL 8803939) (FACING-DOMAIN T 8803886) (COMPLETE-PRAGM NIL 8803862) (COMPLETE-SYNT T 8803804) (COMPLETE-GRAM T 8803804) (COMPLETE-PRAGM T 8803803) (R-DEICTIC-MORPH NIL 8803719) (FACING-DOMAIN NIL 8803717) (TAKING-TURN NIL 8803695) (R-DEICTIC-MORPH T 8803694) (FACING-ME T 8803692) (WANTING-TURN NIL 8803664) (GESTURING NIL 8803663) (HAND-IN-GEST-SPACE NIL 8803663) (COMPLETE-SYNT NIL 8803662) (COMPLETE-GRAM NIL 8803662) (SPEAKING NIL 8803661) (RHAND-IN-GEST-SPACE NIL 8803660) (WANTING-TURN T 8803632) (GESTURING T 8803631) (HAND-IN-GEST-SPACE T 8803631) (LOOKING-AT-HANDS NIL 8803630) (TAKING-TURN T 8803630) |
||
32 |
Perception Modules + BBs |
|
Blackboards and Perception Modules enable us to... |
||
|
||
33 |
Broad-Stroke Hypothesis of Real-Time Perception |
|
Collect ‘evidence’ from raw & processed data
Broad-stroke !== top-down! - We can use evidence from bottom-up AND top-down to find broad strokes |
||
Example |
||
Two people talking,
Alan and Beth |
||
4: Based on Beth’s
expression, he’s pursuaded to pause his speech (t-minus-350 ms) |
||
Use top-down hypotheses to help differentatiate between alternative interpretations, based on broad-stroke information, using...
|
||
2002©K.R.Thórisson