We present synthesized videos to illustrate the effectiveness of different
visual synthesis techniques. We present examples of animations for highlighting
the following aspects of our technique:
1. Quality of animation for the original language
2. Quality of animation for translingual visual synthesis
3. Quality of animation for facial expression synthesis
Head Movement Compensation
We show two videos. The first video uses viseme images to generate the animation
for a given audio utterance in English language. The alignments are obtained
from the utterance through a Engligh speech recognition system.
Video with head movement
In order to remove the jerks seen in the above video, we do a Normalisation
of the original viseme images before synthesying. The way these images were
obtained was by asking the subject to speak the sentence "the sharp quick brown
fox jumped over the lazy dog". The images so obtained may not be aligned. Thus
the above video has disturbing and unintended head motion.
We normalize the viseme images to remove the head motion in the original viseme
images. These images were used to synthesize the video from the same utterance.
The results of using this technique is seen in the smooth video below.
Head movement compensated
Neutral Expression Animations
Here we show a couple of animations where the visemes were captured from
a video on Television. The expression on the original images was NEUTRAL.
The same expression visemes were used to synthesize the videos.
1. Bill Clinton Animation using images obtained from TV
2. Nitendra Rajput using images obtained from TV
Expression Synthesized Animation
To illustrate that we can synthesize animations in expressions that are
different from the expression of the original visemes, we show the following
two videos. The original visemes were in NEUTRAL expression and the video
below has been generated from these visemes.
Neutral
Next we show the result of animation from visemes that have been synthesized
for a SMILE expression. The visemes for SMILE expression have been synthesised
by using the visemes of NEUTRAL expression and more than one viseme in the
SMILE expression.
Smile
To further highlight our ability of synthesyzing animation in expressions other
that that of the original viseme set, we present a video that combines NEUTRAL
and SMILE expressions in a single video. Half of the video uses NEUTRAL visemes
and the other half uses SMILE visemes for generating the animation.
Neutral+Smiling
These three videos clearly illustrates the effectiveness of synthesizing different
expressions in the animation, through facial expression synthesis. The video
that use synthesized visemes do suffer in quality, as is expected.
Translingual Animation
Each of the animations shown above require a speech alignment system in the
langauge of utterance (which was English for all videos above). We provide
a mechanism of Translingual Mapping through which new language
utterances can be aligned using a base language speech alignment
system. This technique allows us to animate faces over any new language. We
provide three such examples. While the first two animations are driven by
Hindi, the third one is in the Telugu language.
1. Animation in Hindi: using television images of
Lord Ram from the T V series "Ramayan"
2. Another Hindi example
3. Animation in Telugu
System Evaluation: Animated vs Natural Words
To measure the goodness of synthesized videos, we did user experiments
on animations that did not have audio in them. Subjects were asked to choose the
word from a list by lip reading the video. Some of the videos for monosyllabic English
words are shown below.
1. "Are": AnimatedNatural 2. "Chat": AnimatedNatural 3. "May": AnimatedNatural The results of the user evaluations showed that most users were able to
lip read the Animated and Original videos.
System Evaluation: Hindi Words
A similar user evaluation study was performed to measure the goodness of
Translingual animation. Following are the sample animations for two Hindi
words.