Migrate from App Engine ndb to Cloud NDB (Module 2)

The Google App Engine (GAE) migration modules teach GAE (Standard) developers how to modernize their apps by moving away from original runtime bundled services because they're not supported by the next generation runtimes.

This tutorial teaches you how to migrate from App Engine's built-in ndb (Next Database) client library to the Cloud NDB client library.

You'll learn how to

  • Use the App Engine ndb library (if you're unfamiliar with it)
  • Migrate from ndb to Cloud NDB
  • Further migrate your app to Python 3

What you'll need

Survey

How will you use this codelab?

Only read through it Read it and complete the exercises

In Module 1, we migrated web frameworks from App Engine's built-in webapp2 to Flask. In this codelab and continuing to move away from App Engine's built-in services, we'll switch from App Engine's ndb library to Google Cloud NDB.

Through completing this migration, you can then:

  1. Migrate to Python 3 & the next-gen App Engine runtime
  2. Migrate to Cloud Datastore (client library for non-App Engine apps)
  3. Containerize your Python 2 (or 3) app and migrate to Cloud Run
  4. Add use of App Engine (push) task queues then migrate to Cloud Tasks

But, we aren't there yet. Finish this codelab before considering those next steps. This tutorial's migration features these primary steps:

  1. Setup/Prework
  2. Add Cloud NDB library
  3. Update application files

Before we get going with the main part of the tutorial, let's setup our project, get the code, then deploy the baseline app so we know we started with working code.

1. Setup project

If you completed the Module 1 codelab, we recommend reusing that same project (and code). Alternatively, you can create a brand new project or reuse another existing project. Ensure the project has an active billing account and App Engine (app) is enabled.

2. "Get" baseline sample app

One of the prerequisites is to have a working Module 1 sample app. Here are some options on where to get it from:

  1. The working sample you created after completing the Module 1 codelab
  2. Doing the Module 1 codelab now before starting this one
  3. Copy the Module 1 repo (link below)

Whether you use yours or ours, the Module 1 code is where we'll START. This Module 2 codelab walks you through each step, and when complete, it should resemble code at the FINISH point (including an optional "bonus" port from Python 2 to 3):

Your STARTing Module 1 code folder should have the following contents:

$ ls
README.md               appengine_config.py     requirements.txt
app.yaml                main.py                 templates

If you completed the Module 1 tutorial, you'll also have a lib folder with Flask and its dependencies. If you don't have a lib folder, create it with the pip install -t lib -r requirements.txt command so that we can deploy this baseline app in the next step. If you have both Python 2 and 3 installed, we recommend using pip2 instead of pip to avoid confusion with Python 3.

3. (Re)Deploy Module 1 app

Your remaining prework steps to execute now:

  1. Re-familiarize yourself with the gcloud command-line tool (if nec.)
  2. (Re)deploy the Module 1 code to App Engine (if nec.)

Once you've successfully executed those steps and confirm it's operational, we'll move ahead in this tutorial, starting with the configuration files.

Many original App Engine built-in services have blossomed into their own products, and Datastore is one of them. Today non-App Engine apps can use Cloud Datastore. For long-time ndb users, the Google Cloud team has created the Cloud NDB client library to talk to Cloud Datastore. It is available for both Python 2 and 3.

Let's update the confirmation files to replace App Engine ndb with Cloud NDB then modify our application.

1. Update requirements.txt

In Module 1, the only external dependency for our app was Flask. Now we'll add Cloud NDB. Here is what your requirements.txt file looked like at the end of Module 1:

  • BEFORE:
Flask==1.1.2

Migrating away from App Engine ndb requires the Cloud NDB library (google-cloud-ndb), so add its package to requirements.txt.

  • AFTER:
Flask==1.1.2
google-cloud-ndb==1.7.1

When this codelab was written, the latest recommended version is 1.7.1, but requirements.txt in the repo may have a newer version. We recommend the latest versions of each library, but if they don't work, you can roll back to an older release.

Delete your lib folder if you have one and didn't just create it above. Now (re)install the updated libraries with the pip install -t lib -r requirements.txt command, using pip2 instead of pip as necessary.

2. Update app.yaml

Adding Google Cloud client libraries like google-cloud-ndb has a few requirements, all revolving around the inclusion of "built-in" libraries, 3rd-party packages already available on Google servers. You don't list them in requirements.txt nor do you bundle/vendor them with pip install. The only requirements:

  1. Specify built-in libraries in app.yaml
  2. Point them to bundled/vendored third-party libraries they may work with (in lib)

Here is the STARTing app.yaml from Module 1:

  • BEFORE:
runtime: python27
threadsafe: yes
api_version: 1

handlers:
- url: /.*
  script: main.app

Now add the following lines to app.yaml to reference a pair of 3rd-party bundled packages: grpcio and setuptools in a new libraries section:

libraries:
- name: grpcio
  version: 1.0.0
- name: setuptools
  version: 36.6.0

Why these built-in libraries? gRPC is an open RPC framework used by all Google Cloud client libraries, including google-cloud-ndb. The grpcio library is the Python gRPC adapter and thus required. The reasoning for including setuptools is coming up.

  • AFTER:

With the changes above, your updated app.yaml should now look like this:

runtime: python27
threadsafe: yes
api_version: 1

handlers:
- url: /.*
  script: main.app

libraries:
- name: grpcio
  version: 1.0.0
- name: setuptools
  version: 36.6.0

3. Update appengine_config.py

The pkg_resources tool, part of the setuptools library, is used to let built-in 3rd-party libraries access the bundled ones. Update appengine_config.py to use pkg_resources to point them to the bundled libraries in lib. When you've completed this change, the entire file should look like this:

import pkg_resources
from google.appengine.ext import vendor

# Set PATH to your libraries folder.
PATH = 'lib'
# Add libraries installed in the PATH folder.
vendor.add(PATH)
# Add libraries to pkg_resources working set to find the distribution.
pkg_resources.working_set.add_entry(PATH)

With the configuration file formalities out of the way, you can now migrate from ndb to Cloud NDB. To complete the migration, update imported libraries and add use of context management in main.py.

1. Imports

Make the following import swap in main.py:

  • BEFORE
from google.appengine.ext import ndb
  • AFTER:
from google.cloud import ndb

The change from an App Engine library to a Google Cloud library is sometimes as subtle this instance. For built-in services that have become full Google Cloud products, you will migrate from importing a library from google.appengine to google.cloud.

2. Datastore access

To be able to use the Cloud NDB library, your app must use Python context managers. Their purpose is to "gate" access to resources such that they must be acquired before they can be used. Context managers are based on the computer science control technique known as Resource Allocation Is Initialization (or RAII). Context managers are used with Python files (which must be opened before they can be accessed) and concurrency "spin locks." which must be acquired before code in a "critical section" can be executed.

Similarly, Cloud NDB requires you acquire the context of a client to communicate with Datastore before any Datastore commands can execute. First, create a client, ndb.Client() by adding ds_client = ndb.Client() in main.py right after Flask initialization:

app = Flask(__name__)
ds_client = ndb.Client()

The Pythonwith command is used solely to obtain an object's context. Wrap any code blocks accessing Datastore with with statements. Below are the same functions from Module 1 for writing a new Entity to, and reading the ten most recently added Entities:

  • BEFORE:

Here's the original code without context management:

def store_visit(remote_addr, user_agent):
    'create new Visit entity in Datastore'
    Visit(visitor='{}: {}'.format(remote_addr, user_agent)).put()

def fetch_visits(limit):
    'get most recent visits'
    return (v.to_dict() for v in Visit.query().order(
            -Visit.timestamp).fetch(limit))
  • AFTER:

Now add with ds_client.context(): and move your Datastore access code into the with block by with these changes:

def store_visit(remote_addr, user_agent):
    'create new Visit entity in Datastore'
    with ds_client.context():
        Visit(visitor='{}: {}'.format(remote_addr, user_agent)).put()

def fetch_visits(limit):
    'get most recent visits'
    with ds_client.context():
        return (v.to_dict() for v in Visit.query().order(
                -Visit.timestamp).fetch(limit))

Deploy application

Re-deploy your app with gcloud app deploy, and confirm the app works. Your code should now match what's in the Module 2 repo.

Congrats for completing this Module 2 codelab. You've just crossed the finish line, since this is the last of the strongly recommended migrations in this series as far as Datastore goes.

Optional: Clean up

What about cleaning up to avoid being billed until you're ready to move onto the next migration codelab? As existing developers, you're likely already up-to-speed on App Engine's pricing information.

Optional: Disable app

If you're not ready to go to the next tutorial yet, disable your app to avoid incurring charges. When you're ready to move onto the next codelab, you can re-enable it. While your app is disabled, it won't get any traffic to incur charges, however another thing you can get billed for is your Datastore usage if it exceeds the free quota, so delete enough to fall under that limit.

On the other hand, if you're not going to continue with migrations and want to delete everything completely, you can shutdown your project.

Next steps

From here, there's flexibility as to your next move. Choose any of these options:

  • Module 2 Bonus: Continue below to the bonus part of this tutorial to explore porting to Python 3 and the next generation App Engine runtime.
  • Module 7: App Engine Push Task Queues (required if you use [push] Task Queues)
    • Adds App Engine taskqueue push tasks to Module 1 app
    • Prepares users for migrating to Cloud Tasks in Module 8
  • Module 4: Migrate to Cloud Run with Docker
    • Containerize your app to run on Cloud Run with Docker
    • Allows you to stay on Python 2
  • Module 5: Migrate to Cloud Run with Cloud Buildpacks
    • Containerize your app to run on Cloud Run with Cloud Buildpacks
    • Do not need to know anything about Docker, containers, or Dockerfiles
    • Requires you to have already migrated your app to Python 3
  • Module 3:
    • Modernize Datastore access from Cloud NDB to Cloud Datastore
    • This is the library used for Python 3 App Engine apps and non-App Engine apps

To access the latest App Engine runtime and features, we recommend that you migrate to Python 3. In our sample app, Datastore was the only built-in service we used, and since we've migrated from ndb to Cloud NDB, we can now port to App Engine's Python 3 runtime.

Overview

While porting to Python 3 is not within the scope of a Google Cloud tutorial, this part of the codelab gives developers an idea of how the Python 3 App Engine runtime differs. One outstanding feature of the next-gen runtime is simplified access to third-party packages: There's no need to specify built-in packages in app.yaml nor a requirement to copy or upload non-built-in libraries; they are implicitly installed from being listed in requirements.txt.

Because our sample is so basic and Cloud NDB is Python 2-3 compatible, no application code needs to be explicitly ported to 3.x: The app runs on 2.x & 3.x unmodified, meaning the only required changes are in configuration in this case:

  1. Simplify app.yaml to reference Python 3 and remove reference to bundled 3rd-party libraries.
  2. Delete appengine_config.py and the lib folder as they're no longer necessary.

In addition to main.py, the requirements.txt and templates/index.html files remain unchanged.

Simplify app.yaml

BEFORE:

The only real change for this sample app is to significantly shorten app.yaml. As a reminder, here's what we had in app.yaml at the conclusion of Module 2:

runtime: python27
threadsafe: yes
api_version: 1

handlers:
- url: /.*
  script: main.app

libraries:
- name: grpcio
  version: 1.0.0
- name: setuptools
  version: 36.6.0

AFTER:

In Python 3, the threadsafe, api_version, and libraries directives are all deprecated; all apps are presumed threadsafe and api_version isn't used in Python 3. There are no longer built-in third-party packages preinstalled on App Engine services, so libraries is also deprecated. Check the documentation on changes to app.yaml for more information on these changes. As a result, you should delete all three from app.yaml and update to a supported Python 3 version (see below).

Use of handlers directive

In addition, the handlers directive, which directs traffic at App Engine applications has also been deprecated. Since the next-gen runtime expects web frameworks to manage app routing, all "handler scripts" must be changed to "auto". Combining the changes from above, you arrive at this app.yaml:

runtime: python38

handlers:
- url: /.*
  script: auto

Learn more about "script: auto" from its documentation page.

Removing handlers directive

Since handlers is deprecated, you can remove the entire section too, leaving a single-line app.yaml:

runtime: python38

By default, this will launch the Gunicorn WSGI web server which is available for all applications. If you're familiar with gunicorn, this is the command executed when it's started by default with the barebones app.yaml:

gunicorn main:app --workers 2 -c /config/gunicorn.py

Use of entrypoint directive

If, however, your application requires a specific start-up command, that can be specified with an entrypoint directive:

runtime: python38
entrypoint: python main.py

This example specifically requests the Flask development server be used instead of gunicorn. Code that starts the development server must also be added to your app to launch on the 0.0.0.0 interface on port 8080 by adding this small section to the bottom of main.py:

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=8080, debug=True)

Learn more about entrypoint from its documentation page. More examples & best practices can be found here and here.

Delete appengine_config.py and lib

Delete the appengine_config.py file and the lib folder. In migrating to Python 3, App Engine acquires and installs packages listed in requirements.txt.

The appengine_config.py config file is used to recognize third-party libraries/packages, whether you've copied them yourself or use ones already available on App Engine servers (built-in). When moving to Python 3, a summary of the big changes are:

  1. No bundling of copied third-party libraries (listed in requirements.txt)
  2. No pip install into a lib folder, meaning no lib folder period
  3. No listing built-in third-party libraries in app.yaml
  4. No need to reference app to third-party libraries, so no appengine_config.py file

Listing all required third-party libraries in requirements.txt is all that's needed.

Deploy application

Re-deploy your app to ensure that it works. You can also confirm how close your solution is to the Module 2 sample Python 3 code. To visualize the differences with Python 2, compare the code with its Python 2 version.

Congrats on finishing the bonus step in Module 2! Visit the documentation on preparing configuration files for the Python 3 runtime. Finally, review the earlier summary above for next steps and cleanup.

Preparing your application

When it is time to migrate your application, you will have to port your main.py and other application files to 3.x, so a best practice is to try your best to make your 2.x application as "forward-compatible" as possible.

There are plenty of online resources to help you accomplish that, but some of the key tips:

  1. Ensure all application dependences are fully 3.x-compatible
  2. Ensure your application runs on at least 2.6 (preferably 2.7)
  3. Ensure application passes entire test suite (and minimum 80% coverage)
  4. Use compatibility libraries such as six, Future, and/or Modernize
  5. Educate yourself on key backwards-incompatible 2.x vs. 3.x differences
  6. Any I/O will likely lead to Unicode vs. byte string incompatibilities

The sample app was designed with all this in mind, hence why it runs on 2.x and 3.x right out of the box so we can focus on showing you what needs to be changed in order to use the next-gen platform.

App Engine migration module codelabs issues/feedback

If you find any issues with this codelab, please search for your issue first before filing. Links to search and create new issues:

Migration resources

Links to the repo folders for Module 1 (START) and Module 2 (FINISH) can be found in the table below. They can also be accessed from the repo for all App Engine codelab migrations.

Codelab

Python 2

Python 3

Module 1

repo

(n/a)

Module 2

repo

repo

App Engine resources

Below are additional resources regarding this specific migration: