Using the Universal Category System and Deep Learning to Automate Audio Categorization and Embedded Metadata.
During the St. Lawrence Summer Fellowship, Cooper A. focused on addressing the challenges faced by sound editors and designers due to the lack of standardized organization in large audio databases. The project involved curating a dataset of over 77,000 audio files, totaling 151 hours, from various sound editors, including his father’s work. Using machine learning frameworks like TensorFlow and PyTorch, the project developed deep learning models and audio spectrogram transformers to categorize different sound effects. A data processing pipeline was designed to efficiently handle the large volume of audio data, streamlining the categorization process and making it possible to categorize backlogged libraries. The project aimed to build an AI engine capable of automatically determining audio file categories according to the Universal Category System (UCS) v8.2. This effort resulted in a large, organized, and efficient audio dataset, contributing valuable advancements to the field of AI and audio processing. The project also navigated challenges related to device storage and hardware limitations, ultimately producing a developed and efficient model.