GPU-Accelerated Sound Classification With Kinetica and TensorFlow

4 min readApr 21, 2020

Deploying a scaleable sound classifier with Docker and Kubernetes

An end to end illustration of a sound classification deployment using the UrbanSound8k dataset

Introduction

Audio processing is an emerging field in various verticals. Speech-to-text solutions, automated predictive maintenance, and voice identification are just a few applications prevalent in the Logistics, Utilities, Automotive, Telecommunications and Cyber-Security industries. In this example, I am going to train and deploy a fairly simple neural network, capable of classifying 10 different sounds from given wav files.

The Data

In this example I use the very popular UrbanSound8k dataset. https://urbansounddataset.weebly.com/urbansound8k.html

This training set is made up of 10 different labels: air conditioner, car horn, children playing, dog bark, drilling, engine idling, gunshot, jackhammer, siren, and street music. The goal is to develop a neural network that can differentiate between the given sounds using roughly 8,732 audio files to train on. A few different things need to happen before we can start training a neural network. First, audio data needs to be extracted from each file. Each file can be broken down by sampling them at a discrete interval called the sampling rate (traditionally 44.1kHz or 44,100 times per second). Each sample represents the amplitude of the wave at the associated interval where it’s bit depth determines how precise the sample will be (traditionally 16bit or a range of 65,536 amplitude values). This is an example of a class that is capable of extracting this information from a directory of audio files.

After this information is extracted we can store it in a Pandas dataframe, write the dataframe to .csv, and drag and drop the .csv into Kinetica to be stored as a table.

Feature Extraction and Model Development

Before I can begin training my Neural Network I need to extract the necessary features from each audio file and their associated label. This is actually very similar to classifying images in the sense that we first create a visual representation of each audio sample then vectorize that image for input to the Neural Network. First I create the visual representation of the sample using the Mel-Frequency Cepstral Coefficients technique or MFFC. This technique generates a visual representation of each sample with a quasi-logarithmic spaced frequency scale — similar to a spectogram just with more detail. To generate this representation we use the library Librosa.

Once these features have been extracted I can store them in a dataframe, dump to disk with pickle, and write to .csv for storage in Kinetica.

At this point we can begin training. This is where we can really leverage Kinetica as a gpu accelerated training environment. Using libraries like TensorFlow-GPU and TensorBoard we can quickly iterate through training cycles and tweak the number of layers, the optimizer, the learning rate, the batch size, number of epochs, dropout, L1/L2 regularization etc. Links for a full training example are provided at the end of this article.

Scaleable Deployment Pipeline

Now that this model is trained and saved I can operationalize it by publishing it to Docker and deploying it on Kubernetes in On-Demand, Batch, or Continuous mode with Kinetica. To achieve this we first need to build a container that can interface with Kinetica’s Active Analytics Workbench. In this container we include the last saved checkpoint that we will use to call our neural network to classify new audio files, the libraries we used to develop this model: TensorFlow-GPU, Keras, Pandas etc. and the Kinetica blackbox sdk. https://github.com/kineticadb/container-kml-blackbox-sdk. After the pre-trained model has been published to docker we can seamlessly deploy it via api or using the Active Analytics Workbench user interface. An end to end full scale deployment - whether that be an on-demand spot check, batch inference, or streaming workload, can be accomplished in Kinetica by combining several separate out of the box tools. For example: we can ingest and store new audio files in .wav format in the Kinetica File System (KiFS). This mount point is treated like any other table in Kinetica. We can then set up a UDF to extract both the audio data and the feature data from these audio files and and store this information in a Kinetica table. Finally, when we deploy the container we built we can either pass these tables as a batch, or point the model to these tables in which it will act as a table monitor and make classifications as the table receives new records.

Useful Links

https://www.kinetica.com/docs/tools/kifs.html — Kinetica File System

https://github.com/kineticadb/container-kml-blackbox-sdk — SDK

https://urbansounddataset.weebly.com/urbansound8k.html — Data

https://github.com/mikesmales/Udacity-ML-Capstone/tree/master/Notebooks — Full walkthrough