RenderScript is the parallel computing framework which is widely used on image processing related Android Applications. On the other hand, Deep Neural Net (DNN) based image filters are gaining more and more attention, which traditionally runs on desktops or servers. With the help of CPU & GPU acceleration of RenderScript, these compute intensive applications are now feasible on mobile devices.

What you will build

Original Picture



Starry Night

What you will learn

What you will need

Download the Code

Click the following button to download all the code for this codelab:

Download source code

Unpack the downloaded zip file. This will unpack a root folder (io-17-codelab), which contains two subfolders for this codelab: rs_healingbrush and rs_neuralnet.

Check out from GitHub

Check the code out from GitHub:

git clone

This will create a directory containing everything you need.

Import the project

Start Android Studio and choose File -> New -> Import Project. Import rs_healingbrush. After the build has finished you'll see the app module, which contains everything you need for the codelab.

Deploy and run the app to verify that your setup is working correctly.

Remove defects

Circle anything you would like to remove, and click "Heal".

You can see that the bee is removed from the picture.

Healing brush works very well removing small defects. You can also import a new image into the app by opening the image first, tapping on "Share" and then selecting the app you just deployed. Try it out yourself!

Use RenderScript support library

So far so good, right? However, the app as-is only supports Android 7.0 and above.

We can make it cover more devices (virtually all Android devices) by using RenderScript support library. And it is simple!

  1. Replace all "import android.renderscript.*" with "import*" in the Java source files.
  2. Edit build.gradle like the following:
defaultConfig {
   renderscriptTargetApi 24
   renderscriptSupportModeEnabled true

Convolutional Neural Network (CNN), by definition, is a type of neural network which mainly consists of convolution layers. CNNs are widely used in areas like image recognition and image processing. And the most common type of convolution layers are 2D convolutions.

Understanding Convolutions

Convolution is the process of adding each element of the image to its local neighbors, weighted by the kernel. For example, if we have two three-by-three matrices, one a kernel, and the other an image piece, convolution is the process of multiplying locationally similar entries and summing. With no paddings, the resulting image (a single pixel) would be a weighted combination of all the entries of the image matrix, with weights given by the kernel:

The other entries would be similarly weighted, where we position the center of the kernel on each of the boundary points of the image, and compute a weighted sum.

The values of a given pixel in the output image are calculated by multiplying each kernel value by the corresponding input image pixel values.

These beautiful animations explain what 2D convolution looks like with and without padding:

This can be described algorithmically with the following pseudo-code:

Trivial Implementation

Based on the pseudo-code, we have our first implementation implemented in RenderScript

The trivial implementation in RenderScript is already much faster than single threaded C++ / Java code. But in the next steps, we will make convolution even faster (~ 4X or more, actually!).

Import the project

Start Android Studio and choose File -> New -> Import Project. Import rs_neuralnet. After the build has finished you'll see the app module, which contains everything you need for the codelab.

Deploy and run the app to verify that your setup is working correctly.

Try it out

Let's play with the application first.

  1. Choose an image / photo.
  2. Select your favorite style.
  3. Select the desired resolution.
  4. Click "PROCESS"

The result is reasonably good. However, it seems a bit too slow, especially when using higher resolution!

Let's figure out how to make it faster.

Make it faster

The implementation still uses the straightforward convolution implementation. It is easy to understand, but the performance is rather slow. Fortunately, we have a much faster way to implement 2D convolution: by turning 2D convolution to matrix-matrix multiplication.

In general, the implementation contains two steps:

  1. Convert the input image to a column-matrix, by duplicating the data based on the shape on the convolution kernel. This operation is known as im2col.
  2. Multiply the column-matrix with the kernel matrix.

Now let's make the change:

Open, and look for

// TODO Step2: Use convolve2DGEMM instead.


Allocation out_alloc = convolve2D(img_padded, img_h, img_w);


Allocation out_alloc = convolve2DGEMM(img_padded, img_h, img_w);

Build and deploy the application again, now you should see a healthy 4X or more performance gain!

So far, the application should work very well on the high-end devices. However, due to the memory expansion from im2col, and huge computation needed, things could still go wrong on low-end devices.

As a bonus, let's figure out how to make it play nice with low - mid range devices.

Support devices with Android 5.1 or lower

ScriptIntrinsicBLAS was introduced in Android 6.0 (Marshmallow), so how can we make the fast 2D convolution work on older devices? We can do similar things to what we have done in the HealingBrush application.

ScriptIntrinsicBLAS is already covered by RenderScript Support Library. You will find the following section in build.gradle of the Artistic Style Transfer app:

defaultConfig {
   renderscriptTargetApi 21
   renderscriptSupportModeEnabled true
   renderscriptSupportModeBlasEnabled true

When "renderscriptSupportModeBlasEnabled" is specified in build.gradle, ScriptIntrinsicBLAS will be enabled automatically.

Tile the convolution

In Step 2, we improved the performance a lot by turning convolution into matrix multiplication. However, the im2col operation takes a lot of memory. It is not uncommon to see a single float matrix taking more than 100 MB.

The model runs well on devices with 2 GB of memory or more, while on low end devices, it may trigger the low memory killer which is not desired. To make the application able to run on most of the devices, we have to reduce the memory footprint.

Fortunately, we don't need to generate the entire column-matrix for the matrix multiplication. We can do that one chunk after another, which is known as "tiling":

Tiling the convolution can greatly reduce the memory footprint for CNNs, by minimizing the size of intermediate column-matrix.

The logic to conduct tiled convolution is implemented, now it's time to try it out:

Open, and look for

// TODO Bonus: Replace FastStyleModel with FastStyleModelTiled and see the perf diff.

Replace all six occurrences of FastStyleModel with FastStyleModelTiled, Build the application and run it on your device.

Other suggestions

So far the application should run well on most of the devices. However, because of slower CPU performance, the time to process a single image can be unbearably slow, especially for a high resolution output. This could happen for other compute intensive image processing tasks.

We could do the following:

As you have already seen, RenderScript is capable of many things, from image processing to deep learning.

As a recap, RenderScript has several strong points:

More details can be found on:

There are lots of cool code samples that currently exist, and more will come. Stay tuned!

Join the party

Our partners have already taken this technology into their applications, to name some of them: Camera360, PicMix, VideoEditor.