EARSHOT: A minimal network model of human speech recognition that operates on real speech

James S. Magnuson, Heejo You, Jay Rueckl, Paul Allopenna, Monica Li, Sahil Luthra, Rachael Steiner, Hosung Nam, Monty Escabi, Kevin Brown, Rachel Theodore, Nicholas Monto

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Despite the lack of invariance problem (the many-to-many mapping between acoustics and percepts), we experience phonetic constancy and typically perceive what a speaker intends. Models of human speech recognition have sidestepped this problem, working with abstract, idealized inputs and deferring the challenge of working with real speech. In contrast, automatic speech recognition powered by deep learning networks have allowed robust, real-world speech recognition. However, the complexities of deep learning architectures and training regimens make it difficult to use them to provide direct insights into mechanisms that may support human speech recognition. We developed a simple network that borrows one element from automatic speech recognition (long short-term memory nodes, which provide dynamic memory for short and long spans). This allows the network to learn to map real speech from multiple talkers to semantic targets with high accuracy. Internal representations emerge that resemble phonetically-organized responses in human superior temporal gyrus, suggesting that the model develops a distributed phonological code despite no explicit training on phonetic or phonemic targets. The ability to work with real speech is a major advance for cognitive models of human speech recognition.

Original languageEnglish
Title of host publicationProceedings of the 41st Annual Meeting of the Cognitive Science Society
Subtitle of host publicationCreativity + Cognition + Computation, CogSci 2019
PublisherThe Cognitive Science Society
Pages2248-2253
Number of pages6
ISBN (Electronic)0991196775, 9780991196777
Publication statusPublished - 2019
Event41st Annual Meeting of the Cognitive Science Society: Creativity + Cognition + Computation, CogSci 2019 - Montreal, Canada
Duration: 2019 Jul 242019 Jul 27

Publication series

NameProceedings of the 41st Annual Meeting of the Cognitive Science Society: Creativity + Cognition + Computation, CogSci 2019

Conference

Conference41st Annual Meeting of the Cognitive Science Society: Creativity + Cognition + Computation, CogSci 2019
Country/TerritoryCanada
CityMontreal
Period19/7/2419/7/27

Bibliographical note

Publisher Copyright:
© Cognitive Science Society: Creativity + Cognition + Computation, CogSci 2019.All rights reserved.

Keywords

  • computational models
  • deep learning
  • neural networks
  • spoken word recognition

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Human-Computer Interaction
  • Cognitive Neuroscience

Fingerprint

Dive into the research topics of 'EARSHOT: A minimal network model of human speech recognition that operates on real speech'. Together they form a unique fingerprint.

Cite this