Create a basic app for audio classification

TensorFlow is a multipurpose machine learning framework. It can be used for training huge models across clusters in the cloud, or running models locally on an embedded system like your phone.

This codelab uses TensorFlow Lite to run an audio classification model on an Android device.

What you'll learn

  • How to find a pre-trained machine learning model ready to be used.
  • How to do audio classification on audio captured in real time.
  • How to use the TensorFlow Lite Support Library to preprocess model input and postprocess model output.
  • How to use the Audio Task Library to do all audio related work.

What you'll build

A simple audio recognizer app that runs a TensorFlow Lite audio recognition model to identify audios from the microphone in real time

33af0fdb0a027fa8.png

What you'll need

  • A recent version of Android Studio (v4.1.2+)
  • Physical Android device with Android version at API 23 (Android 6.0)
  • The sample code
  • Basic knowledge of Android development in Kotlin

Download the Code

Click the following link to download all the code for this codelab:

Download source code

Unpack the downloaded zip file. This will unpack a root folder (odml-pathways) with all of the resources you will need. For this codelab, you will only need the sources in the audio_classification/codelab1/android subdirectory.

Note: If you prefer you can clone the repository:

git clone https://github.com/googlecodelabs/odml-pathways.git

The android subdirectory in the audio_classification/codelab1/android repository contains two directories:

  • android_studio_folder.pngstarter—Starting code that you build upon for this codelab.
  • android_studio_folder.pngfinal—Completed code for the finished sample app.

Import the starter app

Let's start by importing the starter app into the Android Studio.

  1. Open Android Studio and select Import Project (Gradle, Eclipse ADT, etc.)
  2. Open the starter folder (audio_classification/codelab1/android/starter) from the source code you downloaded earlier.

7c0f27882a2698ac.png

To be sure that all dependencies are available to your app, you should sync your project with gradle files when the import process has finished.

  1. Select Sync Project with Gradle Files ( b451ab2d04d835f9.png) from the Android Studio toolbar.

Run the starter app

Now that you have imported the project into Android Studio, you're ready to run the app for the first time.

Connect your Android device via USB to your computer and click Run ( execute.png) in the Android Studio toolbar.

5518972c21705945.png

To do Audio Classification, you're going to need a model. Start with a pre-trained model so you don't have to train one yourself.

To find pre-trained models you will use TensorFlow Hub ( www.tfhub.dev).

91e39900ff58818c.png

Models are categorized by domains. The one you need right now is from the Audio Problem Domains.

9d44d38c297bf3dc.png

For your app, you will do event classification with the YAMNet model.

YAMNet is an audio event classifier that takes audio waveform as input and makes independent predictions for each of 521 audio events.

The model yamnet/classification is already converted to TensorFlow Lite and has specific metadata that enables the TFLite Task Library for Audio to make the model's usage easier to use on mobile devices.

65dc0f610eb27762.png

Choose the right tab: TFLite (yamnet/classification/tflite), and click Download. You can also see the model's metadata at the bottom.

cfb3cfeb310e1f51.png

This model file (lite-model_yamnet_classification_tflite_1.tflite) will be used in the next step.

The first step is to move the downloaded model from the previous step to the assets folder in your app.

In Android Studio, in the project explorer, right-click the assets folder.

7cca2c22ed8cf4c8.png

You'll see a popup with a list of options. One of these will be to open the folder in your file system. On a Mac this will be Reveal in Finder, on Windows it will be Open in Explorer, and on Ubuntu it will be Show in Files. Find the appropriate one for your operating system and select it.

95e0eca881d35f6b.png

Then copy the downloaded model into it.

Once you've done this, go back to Android Studio, and you should see your file within the assets folder.

703b1842fb09e893.png

Now you'll follow some of the TODOs and enable audio classification with the model you have just added to the project in the previous step.

To make it easy to find the TODOs, in Android Studio, go into the menu: View > Tool Windows > TODO. It will open a window with the list, and you can just click it to go straight to the code.

In the file build.gradle (the module version) you will find the first task.

TODO 1 is to add the the Android dependencies:

implementation 'org.tensorflow:tensorflow-lite-task-audio:0.2.0'

All the rest of code changes are going to be on the MainActivity

TODO 2.1 creates the variable with the model's name to load on next steps.

var modelPath = "lite-model_yamnet_classification_tflite_1.tflite"

TODO 2.2 you'll define a minimum threshold to accept a prediction from the model. This variable will be used later.

var probabilityThreshold: Float = 0.3f

TODO 2.3 is where you'll load the model from the assets folder. The AudioClassifier class defined in the Audio Task Library is prepared to load the model and give you all the necessary methods to run inference and also to help create an Audio Recorder.

val classifier = AudioClassifier.createFromFile(this, modelPath)

The Audio Tasks API has some helper methods to help you create an audio recorder with the proper configuration that your model expects (eg: Sample Rate, Bitrate, number of channels). With this you don't need to find it by hand and also create configuration objects.

TODO 3.1: Create the tensor variable that will store the recording for inference and build the format specification for the recorder.

val tensor = classifier.createInputTensorAudio()
val format = classifier.requiredTensorAudioFormat

TODO 3.2: Show the audio recorder specs that were defined by the model's metadata in the previous step.

val format = classifier.requiredTensorAudioFormat
val recorderSpecs = "Number Of Channels: ${format.channels}\n" +
       "Sample Rate: ${format.sampleRate}"
recorderSpecsTextView.text = recorderSpecs

92e81894674a5b0.png

TODO 3.3: Create the audio recorder and start recording.

val record = classifier.createAudioRecord()
record.startRecording()

As of now, your app is listening on your phone's microphone, but it's still not doing any inference. You'll address this in the next step.

In this step you'll add the inference code to your app and show it on the screen. The code already has a timer thread that is executed every half a second, and that's where the inference will be run.

The parameters for the method scheduleAtFixedRate are how long it will wait to start execution and the time between successive task execution, in the code below every 500 milliseconds.

Timer().scheduleAtFixedRate(1, 500) {
...
}

TODO 4.1 Add the code to use the model. First load the recording into an audio tensor and than pass it to the classifier:

tensor.load(record)
val output = classifier.classify(tensor)

TODO 4.2 to have better inference results, you'll filter out any classification that has a very low probability. Here you'll use the variable created in a previous step (probabilityThreshold):

val filteredModelOuput = output[0].categories.filter {
   it.score > probabilityThreshold
}

TODO 4.3: To make reading the result easier, let's create a String with the filtered results:

val outputStr = filteredModelOuput.map { "${it.label} -> ${it.score} " }
   .joinToString(separator = "\n")

TODO 4.4 Update the UI. In this very simple app, the result is just shown in a TextView. Since the classification is not on the Main Thread, you'll need to use a handler to make this update.

runOnUiThread {
   textView.text = outputStr
}

You've added all the code necessary to:

  • Load the model from the assets folder
  • Create an audio recorder with the correct configuration
  • Running inference
  • Show the best results on the screen

All that's needed now is testing the app.

You have integrated the audio classification model to the app, so let's test it.

Connect your Android device, and click Run ( execute.png) in the Android Studio toolbar.

On the first execution, you will need to grant the app audio recording permissions.

After giving the permission, the app on start will use the phone's microphone. To test, start speaking near the phone since one of the classes that YAMNet detects is speech. Another class easy to test is finger snapping or clapping.

You can also try to detect a dog's barks, and many other possible events (521). For a full list you can check out their source code or you can also read the metadata with the labels file directly

33af0fdb0a027fa8.png

In this codelab, you learned how to find a pre-trained model for audio classification and deploy it to your mobile app using TensorFlow Lite. To learn more about TFLite, take a look at other TFLite samples.

What we've covered

  • How to deploy a TensorFlow Lite model on an Android app.
  • How to find and use models from TensorFlow Hub.

Next Steps

  • Customize the model with your own data.

Learn More

Have a Question?

Report Issues