Machine Hearing: Audio Analysis by Emulation of Human Hearing
While many approaches to audio analysis are based on elegant mathematical models, an approach based on emulation of human hearing is becoming a strong challenger. The difference is subtle, as it involves extending such mathematically nice signalprocessing concepts as linear systems, transforms, and second-order statistics to include the messier nonlinear, adaptive, and evolved aspects of hearing. Essentially, the goal is to form representations that do a good job of capturing what a signal â€œsounds likeâ€, so that we can make systems that react accordingly. Some of our recent experimental systems, such as sound retrieval from text queries, melody matching, and music recommendation, employ a four-layer machine-hearing architecture that attempts to simplify and systematize some of the methods used to emulate hearing. The peripheral level utilizes nonlinear filter cascades to model wave propagation in the nonlinear cochlea. The second level computes one or more types of auditory image, as an abstraction of what goes on in the auditory brainstem, and projecting to cortical sheets much as visual images do. The third level is where application-dependent features are extracted from the auditory images, abstractly modeling what likely happens in auditory cortex. Finally, and most abstractly, any appropriate machine-learning system is used to address the needs of an application, the brain-motivated neural network being a prototypical example. Each layer involves different disciplines, and can leverage the experiences of different fields, including hearing science, signal processing, machine vision, and machine learning.