TY - GEN
T1 - Effective lip localization and tracking for achieving multimodal speech recognition
AU - Ooi, Wei Chuan
AU - Jeon, Changwon
AU - Kim, Kihyeon
AU - Ko, Hanseok
AU - Han, David K.
PY - 2009
Y1 - 2009
N2 - Effective fusion of acoustic and visual modalities in speech recognition has been an important issue in Human Computer Interfaces, warranting further improvements in intelligibility and robustness. Speaker lip motion stands out as the most linguistically relevant visual feature for speech recognition. In this paper, we present a new hybrid approach to improve lip localization and tracking, aimed at improving speech recognition in noisy environments. This hybrid approach begins with a new color space transformation for enhancing lip segmentation. In the color space transformation, a PCA method is employed to derive a new one dimensional color space which maximizes discrimination between lip and non-lip colors. Intensity information is also incorporated in the process to improve contrast of upper and corner lip segments. In the subsequent step, a constrained deformable lip model with high flexibility is constructed to accurately capture and track lip shapes. The model requires only six degrees of freedom, yet provides a precise description of lip shapes using a simple least square fitting method. Experimental results indicate that the proposed hybrid approach delivers reliable and accurate localization and tracking of lip motions under various measurement conditions.
AB - Effective fusion of acoustic and visual modalities in speech recognition has been an important issue in Human Computer Interfaces, warranting further improvements in intelligibility and robustness. Speaker lip motion stands out as the most linguistically relevant visual feature for speech recognition. In this paper, we present a new hybrid approach to improve lip localization and tracking, aimed at improving speech recognition in noisy environments. This hybrid approach begins with a new color space transformation for enhancing lip segmentation. In the color space transformation, a PCA method is employed to derive a new one dimensional color space which maximizes discrimination between lip and non-lip colors. Intensity information is also incorporated in the process to improve contrast of upper and corner lip segments. In the subsequent step, a constrained deformable lip model with high flexibility is constructed to accurately capture and track lip shapes. The model requires only six degrees of freedom, yet provides a precise description of lip shapes using a simple least square fitting method. Experimental results indicate that the proposed hybrid approach delivers reliable and accurate localization and tracking of lip motions under various measurement conditions.
UR - http://www.scopus.com/inward/record.url?scp=78651558287&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-89859-7_3
DO - 10.1007/978-3-540-89859-7_3
M3 - Conference contribution
AN - SCOPUS:78651558287
SN - 9783540898580
T3 - Lecture Notes in Electrical Engineering
SP - 33
EP - 43
BT - Multisensor Fusion and Integration for Intelligent Systems
T2 - 7th IEEE International Conference on Multi-Sensor Integration and Fusion, IEEE MFI 2008
Y2 - 20 August 2008 through 22 August 2008
ER -